Pergunta de entrevista da empresa Microsoft

Find the 20 longest strings in a text file.

Respostas da entrevista

Sigiloso

28 de ago. de 2012

Another way can be to just traverse the strings and keep a min b-tree to keep the minimum element on the top. When size of tree + 20, then only insert (replace) in tree, if the current string length > root of tree(min) . This will take O(log20) for each insertion and O(N) for traversal.

3

Sigiloso

11 de out. de 2012

In the hash map approach described in the first answer, instead of having the word as the key, we can have the length as the key and value as the list of words with the length pointed by the key. You can use LinkedHashSet if you want sorting. Or you can maintain the largest 20 lengths in a separate data structure. This obviously is not a really optimal solution in terms of space complexity. Instead of that we can maintain a min-heap. Add first 20 words as it is. And then, if the new word has a larger length than the word at the top of the heap (smallest word as it's a min-heap) you remove the word from the heap and insert the new word into the heap. The heap size is always going to be 20 so you don't take much space and don't take much time to insert the word in the heap as well.

1

Sigiloso

28 de out. de 2012

Correction: The min-heap version takes n.log(k) only which is O(n) only. -This looks the best solution so far.

1

Sigiloso

16 de jan. de 2018

Use selection rank algorithm. Expected time complexity o(n)

Sigiloso

25 de mai. de 2019

Call for the intern!

Sigiloso

28 de out. de 2012

The min-heap solutions looks nice..However, it would take O(n.logn) right? On worst case, we do have to insert/delete in min-heap (log n) for every n words. Any better solution exists?? Can we use dynamic programming to find the longest word and then remove the longest word and repeat the routine for 19 more times? That would be 20times O(N) , which is better than O(n.log n). Any comments?

Sigiloso

28 de ago. de 2012

If you don't have to worry about space, you could use a hash with each word being the "key" and its length being the "value". Then sort bu hash by value, and choose the top 20.