Design a photo de-duplication pipeline
Sigiloso
I asked for clarification on the question and how they define duplicates. Would the files have the same binaries? Or are we talking about deduplication of photos which share similar contents using some more abstract distance metric? Interviewer wasted time on conceptual ML portion when I offered that only to lose time on actually implementing the hashing approach, which they provided a poor hashing function to purposefully give collisions. Inexperienced interviewers doing a poor job of interviewing.