Pergunta de entrevista da empresa Box

Identify where potential race conditions can happen in this hardware & database configuration: 1) validate that folder Z already exists in cache or in the db 2) invalidate cache data for Z 3) INSERT file 'c' into folder Z 4) gather folder Z + children 5) update cache with new information about folder Z hardware set-up would be as follows: --- cache machine 1 // --------------------------- \\ outside world -- --- cache machine 2 --- database \\ --------------------------- // --- cache machine 3 While any machine might be asked about files within folder Z, the actual data of folder Z will be cached on exactly 1 machine out of any of the machines that have cache data.

Resposta da entrevista

Sigiloso

10 de set. de 2012

This was an intensely aggravating problem for the interviewers to set up on the white board and then talk me through. Hopefully I'll recap my answers effectively here: Race condition # 1 (assuming any potential random amount of time between steps) if one update (call it file C) gets pushed back to the cache slower than a second, separate update (which we'll call file D), the cache may have an invalid state. solution: add a rule to step 5 to only update the cache with new information if the timestamp on the update is newer than the timestamp saved on the last cache update. Race condition 2 when updating with separate files (call them files C & D) via two separate machines, updates might get sent to the server that only reflect the newly arrived C or D without the other file (where the update to the cache might still be in the pipeline waiting to be sent up). My - possibly non-optimal - solution to this was to have the cache confirm with the database it's current state and contents of folder Z before applying an update to the cache.

1