Pergunta de entrevista da empresa Huawei Technologies

Explain self-attention? How is it different from attention? How is used in transformers? Some questions about Inverted dropout. How can one make sure that a certain number of parameters in a Deep Network are trained?