String Compression

Patrick Leaton
Problem Description

Given an array of characters chars, compress it using the following algorithm:

Begin with an empty string s. For each group of consecutive repeating characters in chars:

  • If the group's length is 1, append the character to s.
  • Otherwise, append the character followed by the group's length.

The compressed string s should not be returned separately, but instead be stored in the input character array chars. Note that group lengths that are 10 or longer will be split into multiple characters in chars.

After you are done modifying the input array, return the new length of the array.

 

Follow up:
Could you solve it using only O(1) extra space?

 

Example 1:

Input: chars = ["a","a","b","b","c","c","c"]
Output: Return 6, and the first 6 characters of the input array should be: ["a","2","b","2","c","3"]
Explanation: The groups are "aa", "bb", and "ccc". This compresses to "a2b2c3".

Example 2:

Input: chars = ["a"]
Output: Return 1, and the first character of the input array should be: ["a"]
Explanation: The only group is "a", which remains uncompressed since it's a single character.

Example 3:

Input: chars = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]
Output: Return 4, and the first 4 characters of the input array should be: ["a","b","1","2"].
Explanation: The groups are "a" and "bbbbbbbbbbbb". This compresses to "ab12".

Example 4:

Input: chars = ["a","a","a","b","b","a","a"]
Output: Return 6, and the first 6 characters of the input array should be: ["a","3","b","2","a","2"].
Explanation: The groups are "aaa", "bb", and "aa". This compresses to "a3b2a2". Note that each group is independent even if two groups have the same character.

 

Constraints:

  • 1 <= chars.length <= 2000
  • chars[i] is a lower-case English letter, upper-case English letter, digit, or symbol.

 

The description was taken from https://leetcode.com/problems/string-compression/.

Problem Solution

#O(N) Time, O(1) Space
class Solution:
    def compress(self, chars: List[str]) -> int:
        left, right = 0, 0
        while(right < len(chars)):
            char = chars[right]
            count = 0

 
            while(right < len(chars) and chars[right] == char):
                right += 1                              
                count += 1

 
            chars[left] = char
            left += 1
            if count > 1:
                for digit in str(count):
                    chars[left] = digit
                    left += 1
        return left

Problem Explanation


If we are given a problem that involves swapping, overwriting, or moving elements in an array or string, typically utilizing two pointers is a good approach.

What we can do is have one pointer, let's call it right, which will grab the character we need to compress and its frequency.  Then, our second pointer, let's call it left, which will write the character, and its frequency if it is greater than one.

Once we have traversed the entire array, we will return the left pointer since it will have stopped at the last digit it had to write and this will be the last index of the new array.


 Let's start by initializing both pointers at the first index of the array.

        left, right = 0, 0

 

Then, while the right pointer has characters to read, we will grab the new character it is at and initialize a counter to zero.

            char = chars[right]
            count = 0

 

While the right pointer is still at an instance of that character, we will keep moving it and counting each of the instances.

            while(right < len(chars) and chars[right] == char):
                right += 1                              
                count += 1

 

Now that we have the character and the frequency, we will write the character at the left index and take a step forward to set a placeholder for either the count if the frequency was greater than one, or the next unique character.

            chars[left] = char
            left += 1

 

According to the problem constraints, each index is either a letter, digit, or symbol.

That means when we're writing the frequency, each index will need to be zero through nine.

To handle this for frequencies greater than nine, we will convert the count to a string, then traverse the string from left to right writing one character at a time and moving the left pointer to set a placeholder for the next digit or next unique character if we're finished.

            if count > 1:
                for digit in str(count):
                    chars[left] = digit
                    left += 1

 

Once we have finished compressing the characters, we will return the left pointer since it will have written the last unique character or digit in the new array.

        return left