Analyze User Website Pattern

Patrick Leaton
Problem Description

You are given two string arrays username and website and an integer array timestamp. All the given arrays are of the same length and the tuple [username[i], website[i], timestamp[i]] indicates that the user username[i] visited the website website[i] at time timestamp[i].

pattern is a list of three websites (not necessarily distinct).

  • For example, ["home", "away", "love"]["leetcode", "love", "leetcode"], and ["luffy", "luffy", "luffy"] are all patterns.

The score of a pattern is the number of users that visited all the websites in the pattern in the same order they appeared in the pattern.

  • For example, if the pattern is ["home", "away", "love"], the score is the number of users x such that x visited "home" then visited "away" and visited "love" after that.
  • Similarly, if the pattern is ["leetcode", "love", "leetcode"], the score is the number of users x such that x visited "leetcode" then visited "love" and visited "leetcode" one more time after that.
  • Also, if the pattern is ["luffy", "luffy", "luffy"], the score is the number of users x such that x visited "luffy" three different times at different timestamps.

Return the pattern with the largest score. If there is more than one pattern with the same largest score, return the lexicographically smallest such pattern.

 

Example 1:

Input: username = ["joe","joe","joe","james","james","james","james","mary","mary","mary"], timestamp = [1,2,3,4,5,6,7,8,9,10], website = ["home","about","career","home","cart","maps","home","home","about","career"]
Output: ["home","about","career"]
Explanation: The tuples in this example are:
["joe","home",1],["joe","about",2],["joe","career",3],["james","home",4],["james","cart",5],["james","maps",6],["james","home",7],["mary","home",8],["mary","about",9], and ["mary","career",10].
The pattern ("home", "about", "career") has score 2 (joe and mary).
The pattern ("home", "cart", "maps") has score 1 (james).
The pattern ("home", "cart", "home") has score 1 (james).
The pattern ("home", "maps", "home") has score 1 (james).
The pattern ("cart", "maps", "home") has score 1 (james).
The pattern ("home", "home", "home") has score 0 (no user visited home 3 times).

Example 2:

Input: username = ["ua","ua","ua","ub","ub","ub"], timestamp = [1,2,3,4,5,6], website = ["a","b","a","a","b","c"]
Output: ["a","b","a"]

 

Constraints:

  • 3 <= username.length <= 50
  • 1 <= username[i].length <= 10
  • timestamp.length == username.length
  • 1 <= timestamp[i] <= 10^9
  • website.length == username.length
  • 1 <= website[i].length <= 10
  • username[i] and website[i] consist of lowercase English letters.
  • It is guaranteed that there is at least one user who visited at least three websites.
  • All the tuples [username[i], timestamp[i], website[i]] are unique.

 

The description was taken from https://leetcode.com/problems/analyze-user-website-visit-pattern/.

Problem Solution

from itertools import combinations
from collections import defaultdict
#O(U+T+3^W) Time, O(U+T+3^W) Space
class Solution:
    def mostVisitedPattern(self, username: List[str], timestamp: List[int], website: List[str]) -> List[str]:
        output = []
        adjacency, seen = defaultdict(list), defaultdict(int)
        web_info = sorted(zip(username, timestamp, website), key = lambda x: x[1])
       
        for user, _, site in web_info:
            adjacency[user].append(site)
   
        for visits in adjacency.values():
            for pattern in set(itertools.combinations(visits, 3)):
                seen[pattern]+=1
               
        max_score = max(seen.values())
        for pattern in seen:
            if seen[pattern] == max_score:
                output.append(pattern)
               
        return sorted(output)[0]

Problem Explanation


If we are given a list of timestamps, we're going to want to sort those so that we can get an accurate order of events.

From there, we can build an adjacency list to make a graph that maps each user to the sites they visited.

After, we will need to generate each order combination for each user and count how many times we see each website visit combination to find the most visited pattern of three websites.  From there, we will sort the output if there are duplicates and return the first website pattern with the most visits and has the smallest order lexicographically.


Let's start by initializing our output, adjacency list, and seen website pattern counter.

        output = []
        adjacency, seen = defaultdict(list), defaultdict(int)

 

Next, we will make a list of triplet tuples that contain each user, the timestamp that they visited a website, and the website.

We will want to sort this based on the timestamps so that we can get these visits in order.

        web_info = sorted(zip(username, timestamp, website), key = lambda x: x[1])

 

For each user and site in the web info list, we will create an edge in our adjacency list from the website to the user.

        for user, _, site in web_info:
            adjacency[user].append(site)

 

Here is the tricky part of the problem, and likely the reason why it has ten times the amount of dislikes than likes.

If a user visited multiple websites like Facebook, Youtube, Instagram, eBay, then the combinations of patterns could be:

Facebook, Youtube, Instagram,

Youtube, Instagram, eBay,

Facebook, Instagram, eBay,

Facebook, Youtube, eBay,

 

Basically, just any three websites that were in order but not strictly consecutive.

To generate all of these combinations, we could either make our own backtracking function or we could use the itertools combinations library.  This will allow us to have a custom iterator for the given list and size to generate these combinations for us.

We will traverse each user's website visits, convert them into a set since they could have gone to the same site multiple times, and we will track how many times we have seen the website visit pattern combination.

        for visits in adjacency.values():
            for pattern in set(itertools.combinations(visits, 3)):
                seen[pattern]+=1

 

Once we have counted each website visit pattern, we will see what the maximum score is.

        max_score = max(seen.values())

 

Then, for each pattern, if it has the maximum score, we will append it to the output.

        for pattern in seen:
            if seen[pattern] == max_score:
                output.append(pattern)

 

Once we have finished, all we need to do is sort the output and get the first pattern.

        return sorted(output)[0]