Sort Characters By Frequency

Patrick Leaton
Problem Description

Given a string s, sort it in decreasing order based on the frequency of characters, and return the sorted string.

 

Example 1:

Input: s = "tree"
Output: "eert"
Explanation: 'e' appears twice while 'r' and 't' both appear once.
So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a valid answer.

Example 2:

Input: s = "cccaaa"
Output: "aaaccc"
Explanation: Both 'c' and 'a' appear three times, so "aaaccc" is also a valid answer.
Note that "cacaca" is incorrect, as the same characters must be together.

Example 3:

Input: s = "Aabb"
Output: "bbAa"
Explanation: "bbaA" is also a valid answer, but "Aabb" is incorrect.
Note that 'A' and 'a' are treated as two different characters.

 

Constraints:

  • 1 <= s.length <= 5 * 10^5
  • s consists of English letters and digits.

 

The description was taken from https://leetcode.com/problems/sort-characters-by-frequency/.

Problem Solution

#O(N(Log N)) Time, O(N) Space
class Solution:
    def frequencySort(self, s: str) -> str:
        heap, output, seen = [], [], {}
       
        for char in s:
            if char not in seen:
                seen[char] = -1
            else:
                seen[char] -= 1
       
        for char in seen:
            heapq.heappush(heap, (seen[char], char))
       
        while heap:
            frequency, char = heapq.heappop(heap)
            output.append(char * -frequency)
       
        return "".join(output)

Problem Explanation


This problem is very similar to Top K Frequent Elements.

The main difference is that we aren't going to be processing k characters from the string, but all of the characters.  With that being said, a sorting algorithm can be used for the same space complexity as a heap, since we are going to have to push and pop n elements from the heap.  However, a heap will allow us to group the frequencies and the characters together so that we can have a neat solution.

We'll go through the string, count the frequency of each character, then push each frequency and character into a heap, then we will continuously pop each frequency and character from the heap, appending the most frequent characters first.  Once we have no more characters in the heap, we are done so we can return the output string.


Let's start by initializing our heap, output array, and seen hashmap.

        heap, output, seen = [], [], {}

 

It is important here not to use an empty string because strings are immutable, meaning if we want to add to a string we will have to copy it, then append to it each time since we can't change the elements.  This would require a quadratic time complexity.

Next, we will go through each character in the string and count its frequency using our hashmap.

        for char in s:

 

If the character isn't in the hashmap already, we will put it there with an initial count of negative one.

            if char not in seen:
                seen[char] = -1

 

If it is in the hashmap, we will decrement the count by one.

            else:
                seen[char] -= 1

 

To create a makeshift max heap from a min heap, we are making the heap input, the frequency, negative so that the largest frequency in the heap becomes the smallest when negative.

After we have counted frequencies, we will go through each character in the hashmap and push its frequency and itself into the heap.  It is important the frequency is first because the heap sorts on the first value inside the tuple.

        for char in seen:
            heapq.heappush(heap, (seen[char], char))

 

Then, while we have characters still in the heap, we will pop the most frequent character off.

        while heap:
            frequency, char = heapq.heappop(heap)

 

We will then repeat the character as many times as its frequency then append that string to the output array.  

Remember, the frequency values are negative so we will need to multiply them by negative to get a positive value.

            output.append(char * -frequency)

 

Once we have no more characters in the heap, we will join the string values together from the output array and return this output string.

        return "".join(output)