Pergunta de entrevista da empresa Amazon

You have dictionary. How would you design function/system that should return true/false for check if a word is in a database? How would you scale your solution if word db does not fit in memory/disk? How would you scale it to really big db of words that should be located on n computers?

Respostas da entrevista

Sigiloso

22 de jan. de 2011

For dictionaries that fit in memory, a hash table will suffice. For larger dictionaries, you can use a bloom filter backed by an on-disk b-tree. To scale to multiple computers, you can hash the search word and use a token ring to consistently determine which node that word would exist on, then on each individual node use the bloom filter and b-tree.

3

Sigiloso

22 de jan. de 2011

You'd enter the words into a binary search tree, which should be able to search down to a word in log(n) time, with n being the total number of words. If your db doesn't fit on one PC, then you'd place subtrees on different machines and have a master index, indicating which machine had which letter or prefix. For example, if you had 26 machines available, you could put a piece of the tree that corresponds to a letter on each machine. You'd structure the tree so the top node was as close to the middle of the set of possible values on each machine. While this would take some preparation, you'd get fast searches and relatively easy insertions of new nodes.

2