Design Search Autocomplete System

Patrick Leaton

Published: June 12, 2021, 6:12 p.m.

Problem Description

Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#').

You are given a string array sentences and an integer array times both of length n where sentences[i] is a previously typed sentence and times[i] is the corresponding number of times the sentence was typed. For each input character except '#', return the top 3 historical hot sentences that have the same prefix as the part of the sentence already typed.

Here are the specific rules:

The hot degree for a sentence is defined as the number of times a user typed the exactly same sentence before.
The returned top 3 hot sentences should be sorted by hot degree (The first is the hottest one). If several sentences have the same hot degree, use ASCII-code order (smaller one appears first).
If less than 3 hot sentences exist, return as many as you can.
When the input is a special character, it means the sentence ends, and in this case, you need to return an empty list.

Implement the AutocompleteSystem class:

AutocompleteSystem(String[] sentences, int[] times) Initializes the object with the sentences and times arrays.
List<String> input(char c) This indicates that the user typed the character c.
- Returns an empty array [] if c == '#' and stores the inputted sentence in the system.
- Returns the top 3 historical hot sentences that have the same prefix as the part of the sentence already typed. If there are fewer than 3 matches, return them all.

Example 1:

Input
["AutocompleteSystem", "input", "input", "input", "input"]
[[["i love you", "island", "iroman", "i love leetcode"], [5, 3, 2, 2]], ["i"], [" "], ["a"], ["#"]]
Output
[null, ["i love you", "island", "i love leetcode"], ["i love you", "i love leetcode"], [], []]

Explanation
AutocompleteSystem obj = new AutocompleteSystem(["i love you", "island", "iroman", "i love leetcode"], [5, 3, 2, 2]);
obj.input("i"); // return ["i love you", "island", "i love leetcode"]. There are four sentences that have prefix "i". Among them, "ironman" and "i love leetcode" have same hot degree. Since ' ' has ASCII code 32 and 'r' has ASCII code 114, "i love leetcode" should be in front of "ironman". Also we only need to output top 3 hot sentences, so "ironman" will be ignored.
obj.input(" "); // return ["i love you", "i love leetcode"]. There are only two sentences that have prefix "i ".
obj.input("a"); // return []. There are no sentences that have prefix "i a".
obj.input("#"); // return []. The user finished the input, the sentence "i a" should be saved as a historical sentence in system. And the following input will be counted as a new search.

Constraints:

n == sentences.length
n == times.length
1 <= n <= 100
1 <= sentences[i].length <= 100
1 <= times[i] <= 50
c is a lowercase English letter, a hash '#', or space ' '.
Each tested sentence will be a sequence of characters c that end with the character '#'.
Each tested sentence will have a length in the range [1, 200].
The words in each input sentence are separated by single spaces.
At most 5000 calls will be made to input.

The description was taken from https://leetcode.com/problems/design-search-autocomplete-system/.

Problem Solution

class TrieNode:

    def __init__(self, val:str) -> None:

        self.val = val

        self.children = {}

        self.end_here = False

        self.visits = 0

        self.sentence = ""

    

class Trie:

    def __init__(self):

        self.root = TrieNode(None)

    

    def insert(self, words:str, times:int) -> None:

        node = self.root

        for char in words:

            if char not in node.children:

                node.children[char] = TrieNode(char)

            node = node.children[char]

        if not node.end_here:

            node.end_here = True

            node.sentence = words

            if times:

                node.visits = times

            else:

                node.visits = 1

        else:

            node.visits += 1

    

    def starts_with(self, words:str) -> TrieNode:

        node = self.root

        for char in words:

            if char not in node.children:

                return None

            node = node.children[char]

        return node

    

    def hottest(self, words:str) -> list[str]:

        seen, output, stack, heap = {}, [], [], []

        prefix = self.starts_with(words)

        if not prefix:

            return None

        stack.append(prefix)

        while stack:

            node = stack.pop()

            if node.end_here:

                seen[node.sentence] = node.visits

            for child in node.children:

                next_node = node.children[child]

                stack.append(next_node)

        for words in seen:

            heapq.heappush(heap, (-seen[words], words))

        hottest = 3

        while heap:

            if not hottest:

                break

            frequency, words = heapq.heappop(heap)

            output.append(words)

            hottest -= 1

        return output

            

class AutocompleteSystem:

    def __init__(self, sentences: List[str], times: List[int]):

        self.trie = Trie()

        self.web_search = ""

        for i in range(len(sentences)):

            self.trie.insert(sentences[i], times[i])

    def input(self, c: str) -> List[str]:

        if c == "#":

            self.trie.insert(self.web_search, 0)

            self.web_search = ""

            return []

        self.web_search += c

        return self.trie.hottest(self.web_search)

Problem Explanation

From the problem description, it is apparent that we will be traversing prefixes. If that is the case, we should implement a Prefix Tree, a Trie.

We will be tracking the count of sentences that have been searched, so we will need some kind of HashMap counter. We will also be returning the top three most frequent sentences so we will need a sorted list or heap for sorting the sentence frequencies.

We also have an article for Implement Trie Prefix Tree and Search Suggestion System that is recommended to be done prior to continuing.

This is going to be a long one.

Let's start out by creating a TrieNode class. This will define the building blocks of our Trie.

class TrieNode:

Each Trie node will have a character value, a dictionary of children for constant lookup, a flag to signal the end of a sentence or word input, a number of visits this node has been traversed, and a sentence that will give us its upper branch in constant time.

def __init__(self, val:str) -> None: self.val = val self.children = {} self.end_here = False self.visits = 0 self.sentence = ""

Next, we will have a Trie class for building the Trie.

class Trie:

A root will be initialized with a null character value once the Trie class has been instantiated.

def __init__(self): self.root = TrieNode(None)

Let's also make a function for inserting character nodes into our Trie.

def insert(self, words:str, times:int) -> None:

We will start by creating a node pointer at the root of the Trie.

Then for each character within the given word(s), if they don't exist within the Trie already, meaning we don't currently have a common prefix in the Trie to this given word(s), we will instantiate a character node and place it in the children of the uncommon prefix branch of the Trie then iterate to it.

node = self.root for char in words: if char not in node.children: node.children[char] = TrieNode(char) node = node.children[char]

Once we have finished placing each character node or traversing them if they already existed, we will see if we have already been here before so that we don't overwrite the node visits.

If we haven't, we will set the current node's end_here flag to true, set its sentence to the given word(s), and if the number of times that was input wasn't zero, we will initialize the number of visits to those times this was search occurred. Otherwise, we will initialize it to one.

if not node.end_here: node.end_here = True node.sentence = words if times: node.visits = times else: node.visits = 1

If we have already set this flag before, we will just increment our visit count by one.

else: node.visits += 1

We could also use a starts_with function to traverse prefixes and return the last node of the prefix so that we can find what similar words or sentences come after it within the Trie inside the node's children.

def starts_with(self, words:str) -> TrieNode:

We will initialize a node pointer at the root of the Trie, and for each character node within the given word(s), if they don't exist within the Trie yet then we have nothing to return. Otherwise, we will keep moving down until we have finished the prefix and return the last character node of the prefix.

node = self.root for char in words: if char not in node.children: return None node = node.children[char] return node

Next up is our hottest function. This is the biggest challenge of the problem, this is where we will take in a prefix string and return the top three suffixes we have in our Trie.

def hottest(self, words:str) -> list[str]:

We'll start by initializing a dictionary that will take a count how many times we have seen a word or sentence, an output list for returning the top three searched words or sentences, a stack for a Depth-First Search we will be performing later, and a heap to sort the results.

seen, output, stack, heap = {}, [], [], []

Let's grab the last character node of the given prefix by passing the given input to the starts_with function.

prefix = self.starts_with(words)

If we don't have a prefix that was returned then we have nothing to return.

if not prefix: return None

Otherwise, we will append that node to the stack and begin a DFS to get each suffix word or sentence and to also count these within our dictionary.

stack.append(prefix)

While we still have character nodes on the stack, we will pop a node, see if it is the last node of a previously searched word or sentence that we placed in our Trie, and if it is, we will place it into our seen dictionary with the number of visits it had.

while stack: node = stack.pop() if node.end_here: seen[node.sentence] = node.visits

Afterward, we will place each child character node in the stack.

for child in node.children: next_node = node.children[child] stack.append(next_node)

Once we have grouped each word(s) by frequency, we will go through and push these frequencies and word(s) into a max-heap by negating the frequency values for Python's default min-heap.

for words in seen: heapq.heappush(heap, (-seen[words], words))

Now that we have all of the searches grouped by frequency, we will continuously pop the most frequent search and append the search to the output until our hottest count has been decremented, or the heap has been emptied in case there were fewer than three results.

hottest = 3 while heap: if not hottest: break frequency, words = heapq.heappop(heap) output.append(words) hottest -= 1

Once we have our hottest results, we will return them.

return output

Let's go back to the given class and functions now that we have our helper functions built.

class AutocompleteSystem:

We will initialize a trie and an empty web search, then for each list of sentences and times, we will insert them into the Trie.

def __init__(self, sentences: List[str], times: List[int]): self.trie = Trie() self.web_search = "" for i in range(len(sentences)): self.trie.insert(sentences[i], times[i])

Within our input function, if we have gotten a pound sign then we will insert the web search that we built up in previous function calls, clear the web search and return an empty list.

def input(self, c: str) -> List[str]: if c == "#": self.trie.insert(self.web_search, 0) self.web_search = "" return []

If we have gotten anything else, we will add it to the web search and return the hottest three suffixes that are in the Trie.

self.web_search += c return self.trie.hottest(self.web_search)