How would you design and evaluate an ML model in a production environment where data distribution changes over time and labeled data is limited?

Question

Sigiloso · Accepted Answer

I explained that I would start by explicitly treating the problem as non-stationary, defining expected sources of drift (data drift vs. concept drift). I would design robust validation using time-based splits, introduce lightweight monitoring for input distributions and key prediction metrics, and rely on proxy metrics where labels are delayed. I emphasized incremental retraining, shadow deployments, and clear rollback criteria, rather than optimizing only offline metrics.

Rozetka.ua

Pergunta de entrevista da empresa Rozetka.ua

Resposta da entrevista

Empresas seguidas

Buscas de vagas