Vanishing gradients problem in Deep learning

Question

Sigiloso · Accepted Answer

when gradient must flow thru a lot of layers w transformations, by chain rule will become a product of a large  quantity of jacobians. if eigenvalues of jacobians are mostly <1,  product of lots of fractions  is super small amount, resulting in super small gradients for early layers and slow/stuck training. 
sol is skip-connections, smart initialization  of weights to roughly center 
 jacobian eigenvalues around 1, or use relu or tanh not sigmoid.

C3 AI

Pergunta de entrevista da empresa C3 AI

Resposta da entrevista

Empresas seguidas

Buscas de vagas