Top K Frequent Words

Patrick Leaton
Problem Description

Given an array of strings words and an integer k, return the k most frequent strings.

Return the answer sorted by the frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.

 

Example 1:

Input: words = ["i","love","leetcode","i","love","coding"], k = 2
Output: ["i","love"]
Explanation: "i" and "love" are the two most frequent words.
Note that "i" comes before "love" due to a lower alphabetical order.

Example 2:

Input: words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4
Output: ["the","is","sunny","day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.

 

Constraints:

  • 1 <= words.length <= 500
  • 1 <= words[i] <= 10
  • words[i] consists of lowercase English letters.
  • k is in the range [1, The number of unique words[i]]

 

Follow-up: Could you solve it in O(n log(k)) time and O(n) extra space?

The description was taken from https://leetcode.com/problems/top-k-frequent-words/.

Problem Solution

#O(N(Log K)) Time, O(N) Space
class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        seen, heap, output = {}, [], []
       
        for word in words:
            if word not in seen:
                seen[word] = -1
            else:
                seen[word] -= 1
       
        for word in seen:
            heapq.heappush(heap, (seen[word], word))
       
        while heap and k > 0:
            frequency, word = heapq.heappop(heap)
            output.append(word)
            k -= 1
       
        return output

Problem Explanation


This question is nearly the same as Top K Frequent Elements.

Right off the bat, the problem title states we are returning the k most frequent words.  

The "k" is usually an indication that this can be solved by utilizing a heap.  The "most" tells us that this may be a max heap.  

From those clues, the implementation is how we may expect.  We will use a hashmap to count the frequency of each word in the input array then we will take those frequencies and push them into a heap.  We will then continuously pop the most frequent word off of the heap and append it to the output.  Once we have done that k times, we are done.

Thankfully for us, the heap will sort on character values when there is a tie in frequencies.


Let's start by initializing our heap, our output, and our seen hashmap.

        seen, heap, output = {}, [], []

 

Next, we will go through the array and count the times we see each word.

        for word in words:

 

If the word isn't in the hashmap already, we will put it there with an initial count of negative one.

            if word not in seen:
                seen[word] = -1

 

If it is in the hashmap, we will decrement the count by one.

            else:
                seen[word] -= 1

 

To make a makeshift max heap from a min heap, we are making the heap input, the frequency, negative so that the largest frequency in the heap becomes the smallest when negative.

After have counted frequencies, we will go through each word in the seen hashmap and push the frequency and itself into the heap.  It is important the frequency is first because the heap primarily sorts on the first value inside the tuple.

        for word in seen:
            heapq.heappush(heap, (seen[word], word))

 

Then, while we still have values in the heap and while we haven't found the k most frequent words, we will pop the top frequent word off of the heap along with its count.

        while heap and k > 0:
            frequency, word = heapq.heappop(heap)

 

We will append the current top frequent word to the output and decrement k since we just found one more frequent word.

            output.append(word)
            k -= 1

 

Once we have run out of frequencies or have found k most frequent words, we will return those words in the output.

        return output