Pergunta de entrevista da empresa Google

Given X number of search terms, write an algorithm that will return the smallest substring from an article that contains all of the search terms.

Respostas da entrevista

Sigiloso

3 de mai. de 2013

2 Rahul: it looks like the longest substring to me. One solution could be as follows: 1. Go through the text and create a list of indices for each term. Here the choice of a string search algorithm is important. For multiple patterns in the same text, the Rabin-Karp algorithm might be a reasonable choice, because it finds all k terms in O(n+k) time. 2. Go through all m instances/indices of all terms. At every index (belongs to one of the terms), find the closest instances of all other terms (prev or next index in their lists), thus obtaining the length of a substring with all terms and storing its minimum. This can be done in O(m k). Not sure, but maybe p.2 can be combined with p.1 or improved or eliminated. Another concern is whether it's one-time task or not, meaning if you constantly have to solve this for numerous sets of terms, then completely another (and probably more complicated) solution might be faster - something with text pre-processing, likely involving suffix trees/arrays, LCP arrays, etc.

Sigiloso

24 de abr. de 2013

Here's what I would do. Traverse the article identifying each search term as you go - this should be quick enough. As each keyword is identified, mark it's latest position. Keep track of which keyword was seen farthest back from our current position. The first time all search terms are identified, record the indices of the first and last terms as well as the size of the substring. Each time the farthest back term changes, check the current substring size against the recorded size. If it's smaller, update the recorded indices and size. This process could get bogged down if there are a large number of search terms.

Sigiloso

1 de mai. de 2013

public static String smallestSubset(String article, String[] terms) { if (null == article) return null; if (null == terms) return null; int start = article.length() - 1; int stop = 0; for (String term : terms) { int termStart = article.indexOf(term); int termStop = termStart + term.length() - 1; start = Math.min(start, termStart); stop = Math.max(stop, termStop); } if (stop < start) return ""; return article.substring(start, stop+1); }

2