difference between correlation and independence, k-means clustering pros and cons

Question

Sigiloso · Accepted Answer

correlation : e[xy] - e[x]e[y], can be zero even if variables are not independent, usually can set up tricky rv that satisfies this
independence: p(x,y) = p(x)p(y). automatically implies 0 cov (plug it in)

k-means:
pros: good when you know number of distinct clusters without too much overlap between. run-time calculation is p fast, just compare to centoids O(num_means * num_dimension). interpretable and can use custom distance functions.
cons:  needs distance function, hard when data is on differing magnitudes. training is always  approximation, has to be trained, optimal solution is np-hard. training doesn't always converge, bad initial points can make clusters bad, hard to tell how many clusters is sufficient, cannot model complex clusters (think clusters of concentric rings)

C3 AI

Pergunta de entrevista da empresa C3 AI

Resposta da entrevista

Empresas seguidas

Buscas de vagas