Pergunta de entrevista da empresa Tanla Platforms

Explain why there is a need of multi head attention in encoder models ? Why one can not use discrete numbers for positional embedding in transformers ?