How would you benchmark an object detection model acting over a vide stream? For example, a model that is meant to detect alcohol consumption scenes in a video.

Question

Sigiloso · Accepted Answer

Looking at the frame-by-frame predictions would not work very well, primarily due to the class imbalance (most objects are not present in most frames). Visually one can plot the predicted probability that an object is present over time - vs the ground truth label, and one can quantify the overlap via PR AUC which is more robust to class imbalance than ROC AUC, and without a need to condition on a given threshold.

Channel 4

Pergunta de entrevista da empresa Channel 4

Resposta da entrevista

Empresas seguidas

Buscas de vagas