Pergunta de entrevista da empresa C3 AI

How does a random forest calculate probabilities? What are the pros/cons of different methods for initializing values in k-means clustering?

Resposta da entrevista

Sigiloso

9 de dez. de 2022

K-mean clustering is dependent on dissimilarity function, so if one uses L2-norm, one has to know that k-mean will be sensitive to outliers. Second, it assumes that the clusters are “circular”, which may or may not be true. A better approach is to use Self-Organizing map or Gaussian mixture models. The advantage of using k-mean clustering it is easy to implement, it does give good results when clustering text (this is from my experience