Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'
).
You are given a string array sentences
and an integer array times
both of length n
where sentences[i]
is a previously typed sentence and times[i]
is the corresponding number of times the sentence was typed. For each input character except '#'
, return the top 3
historical hot sentences that have the same prefix as the part of the sentence already typed.
Here are the specific rules:
3
hot sentences should be sorted by hot degree (The first is the hottest one). If several sentences have the same hot degree, use ASCII-code order (smaller one appears first).3
hot sentences exist, return as many as you can.Implement the AutocompleteSystem
class:
AutocompleteSystem(String[] sentences, int[] times)
Initializes the object with the sentences
and times
arrays.List<String> input(char c)
This indicates that the user typed the character c
.
[]
if c == '#'
and stores the inputted sentence in the system.3
historical hot sentences that have the same prefix as the part of the sentence already typed. If there are fewer than 3
matches, return them all.
Example 1:
Input ["AutocompleteSystem", "input", "input", "input", "input"] [[["i love you", "island", "iroman", "i love leetcode"], [5, 3, 2, 2]], ["i"], [" "], ["a"], ["#"]] Output [null, ["i love you", "island", "i love leetcode"], ["i love you", "i love leetcode"], [], []] Explanation AutocompleteSystem obj = new AutocompleteSystem(["i love you", "island", "iroman", "i love leetcode"], [5, 3, 2, 2]); obj.input("i"); // return ["i love you", "island", "i love leetcode"]. There are four sentences that have prefix "i". Among them, "ironman" and "i love leetcode" have same hot degree. Since ' ' has ASCII code 32 and 'r' has ASCII code 114, "i love leetcode" should be in front of "ironman". Also we only need to output top 3 hot sentences, so "ironman" will be ignored. obj.input(" "); // return ["i love you", "i love leetcode"]. There are only two sentences that have prefix "i ". obj.input("a"); // return []. There are no sentences that have prefix "i a". obj.input("#"); // return []. The user finished the input, the sentence "i a" should be saved as a historical sentence in system. And the following input will be counted as a new search.
Constraints:
n == sentences.length
n == times.length
1 <= n <= 100
1 <= sentences[i].length <= 100
1 <= times[i] <= 50
c
is a lowercase English letter, a hash '#'
, or space ' '
.c
that end with the character '#'
.[1, 200]
.5000
calls will be made to input
.
The description was taken from https://leetcode.com/problems/design-search-autocomplete-system/.
class TrieNode:
def __init__(self, val:str) -> None:
self.val = val
self.children = {}
self.end_here = False
self.visits = 0
self.sentence = ""
class Trie:
def __init__(self):
self.root = TrieNode(None)
def insert(self, words:str, times:int) -> None:
node = self.root
for char in words:
if char not in node.children:
node.children[char] = TrieNode(char)
node = node.children[char]
if not node.end_here:
node.end_here = True
node.sentence = words
if times:
node.visits = times
else:
node.visits = 1
else:
node.visits += 1
def starts_with(self, words:str) -> TrieNode:
node = self.root
for char in words:
if char not in node.children:
return None
node = node.children[char]
return node
def hottest(self, words:str) -> list[str]:
seen, output, stack, heap = {}, [], [], []
prefix = self.starts_with(words)
if not prefix:
return None
stack.append(prefix)
while stack:
node = stack.pop()
if node.end_here:
seen[node.sentence] = node.visits
for child in node.children:
next_node = node.children[child]
stack.append(next_node)
for words in seen:
heapq.heappush(heap, (-seen[words], words))
hottest = 3
while heap:
if not hottest:
break
frequency, words = heapq.heappop(heap)
output.append(words)
hottest -= 1
return output
class AutocompleteSystem:
def __init__(self, sentences: List[str], times: List[int]):
self.trie = Trie()
self.web_search = ""
for i in range(len(sentences)):
self.trie.insert(sentences[i], times[i])
def input(self, c: str) -> List[str]:
if c == "#":
self.trie.insert(self.web_search, 0)
self.web_search = ""
return []
self.web_search += c
return self.trie.hottest(self.web_search)
From the problem description, it is apparent that we will be traversing prefixes. If that is the case, we should implement a Prefix Tree, a Trie.
We will be tracking the count of sentences that have been searched, so we will need some kind of HashMap counter. We will also be returning the top three most frequent sentences so we will need a sorted list or heap for sorting the sentence frequencies.
We also have an article for Implement Trie Prefix Tree and Search Suggestion System that is recommended to be done prior to continuing.
This is going to be a long one.
Let's start out by creating a TrieNode
class. This will define the building blocks of our Trie.
class TrieNode:
Each Trie node will have a character value, a dictionary of children for constant lookup, a flag to signal the end of a sentence or word input, a number of visits this node has been traversed, and a sentence that will give us its upper branch in constant time.
def __init__(self, val:str) -> None:
self.val = val
self.children = {}
self.end_here = False
self.visits = 0
self.sentence = ""
Next, we will have a Trie class for building the Trie.
class Trie:
A root will be initialized with a null character value once the Trie class has been instantiated.
def __init__(self):
self.root = TrieNode(None)
Let's also make a function for inserting character nodes into our Trie.
def insert(self, words:str, times:int) -> None:
We will start by creating a node pointer at the root of the Trie.
Then for each character within the given word(s), if they don't exist within the Trie already, meaning we don't currently have a common prefix in the Trie to this given word(s), we will instantiate a character node and place it in the children of the uncommon prefix branch of the Trie then iterate to it.
node = self.root
for char in words:
if char not in node.children:
node.children[char] = TrieNode(char)
node = node.children[char]
Once we have finished placing each character node or traversing them if they already existed, we will see if we have already been here before so that we don't overwrite the node visits.
If we haven't, we will set the current node's end_here
flag to true, set its sentence to the given word(s), and if the number of times that was input wasn't zero, we will initialize the number of visits to those times this was search occurred. Otherwise, we will initialize it to one.
if not node.end_here:
node.end_here = True
node.sentence = words
if times:
node.visits = times
else:
node.visits = 1
If we have already set this flag before, we will just increment our visit count by one.
else:
node.visits += 1
We could also use a starts_with
function to traverse prefixes and return the last node of the prefix so that we can find what similar words or sentences come after it within the Trie inside the node's children.
def starts_with(self, words:str) -> TrieNode:
We will initialize a node pointer at the root of the Trie, and for each character node within the given word(s), if they don't exist within the Trie yet then we have nothing to return. Otherwise, we will keep moving down until we have finished the prefix and return the last character node of the prefix.
node = self.root
for char in words:
if char not in node.children:
return None
node = node.children[char]
return node
Next up is our hottest function. This is the biggest challenge of the problem, this is where we will take in a prefix string and return the top three suffixes we have in our Trie.
def hottest(self, words:str) -> list[str]:
We'll start by initializing a dictionary that will take a count how many times we have seen a word or sentence, an output list for returning the top three searched words or sentences, a stack for a Depth-First Search we will be performing later, and a heap to sort the results.
seen, output, stack, heap = {}, [], [], []
Let's grab the last character node of the given prefix by passing the given input to the starts_with
function.
prefix = self.starts_with(words)
If we don't have a prefix that was returned then we have nothing to return.
if not prefix:
return None
Otherwise, we will append that node to the stack and begin a DFS to get each suffix word or sentence and to also count these within our dictionary.
stack.append(prefix)
While we still have character nodes on the stack, we will pop a node, see if it is the last node of a previously searched word or sentence that we placed in our Trie, and if it is, we will place it into our seen dictionary with the number of visits it had.
while stack:
node = stack.pop()
if node.end_here:
seen[node.sentence] = node.visits
Afterward, we will place each child character node in the stack.
for child in node.children:
next_node = node.children[child]
stack.append(next_node)
Once we have grouped each word(s) by frequency, we will go through and push these frequencies and word(s) into a max-heap by negating the frequency values for Python's default min-heap.
for words in seen:
heapq.heappush(heap, (-seen[words], words))
Now that we have all of the searches grouped by frequency, we will continuously pop the most frequent search and append the search to the output until our hottest count has been decremented, or the heap has been emptied in case there were fewer than three results.
hottest = 3
while heap:
if not hottest:
break
frequency, words = heapq.heappop(heap)
output.append(words)
hottest -= 1
Once we have our hottest results, we will return them.
return output
Let's go back to the given class and functions now that we have our helper functions built.
class AutocompleteSystem:
We will initialize a trie and an empty web search, then for each list of sentences and times, we will insert them into the Trie.
def __init__(self, sentences: List[str], times: List[int]):
self.trie = Trie()
self.web_search = ""
for i in range(len(sentences)):
self.trie.insert(sentences[i], times[i])
Within our input function, if we have gotten a pound sign then we will insert the web search that we built up in previous function calls, clear the web search and return an empty list.
def input(self, c: str) -> List[str]:
if c == "#":
self.trie.insert(self.web_search, 0)
self.web_search = ""
return []
If we have gotten anything else, we will add it to the web search and return the hottest three suffixes that are in the Trie.
self.web_search += c
return self.trie.hottest(self.web_search)