Find the most common "3 page path" on a website given a large data log.
Sigiloso
Quick and dirty: * Build a list of url/timestamps per user * sort each list by timestamp * iterate over each list o for each 3 URL sequence, create or increment a counter * find the highest count in the URL sequence count list foreach(entry in parsedLog) { users[entry.user].urls.add(entry.time, entry.url) } foreach(user in users) { user.urls.sort() for(i = 0; i < user.urls.length - 2; i++) { key = createKey(user.urls[i], user.urls[i+1], user.urls[i+2] sequenceCounts.incrementOrCreate(key); } } sequenceCounts.sortDesc() largestCountKey = sequenceCounts[0] topUrlSequence = parseKey(largestCountkey) found on this site: http://stackoverflow.com/questions/2991480/most-frequent-3-page-sequence-in-a-weblog