Concatenated Words

Patrick Leaton
Problem Description

Given an array of strings words (without duplicates), return all the concatenated words in the given list of words.

concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.

 

Example 1:

Input: words = ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]
Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]
Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats"; 
"dogcatsdog" can be concatenated by "dog", "cats" and "dog"; 
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

Example 2:

Input: words = ["cat","dog","catdog"]
Output: ["catdog"]

 

Constraints:

  • 1 <= words.length <= 10^4
  • 0 <= words[i].length <= 1000
  • words[i] consists of only lowercase English letters.
  • 0 <= sum(words[i].length) <= 10^5

 

The description was taken from https://leetcode.com/problems/concatenated-words/.

Problem Solution

#DP
#O(N^4) Time, O(N) Space

class Solution:
    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
        output, words_dict = [], {}
        for word in words:
            words_dict[word] = True
       
        def dp(word:str) -> bool:
            if not word:
                return False
            del words_dict[word]
            dp = [True] + [False] * (len(word))
            for i in range(1, len(dp)):
                for j in range(i):
                    if dp[j] and word[j:i] in words_dict:
                        dp[i] = True
            words_dict[word] = True
            return dp[-1]
       
        for word in words:
            if dp(word):
                output.append(word)
        return output
 
#BFS
#O(N^4) Time, O(N) Space

class Solution:
    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
        output, words_dict = [], {}
        for word in words:
            words_dict[word] = True
           
        def bfs(word:str) -> bool:
            if not word:
                return False
            seen = {}
            del words_dict[word]
            queue = collections.deque([0])
            while queue:
                width = len(queue)
                for _ in range(width):
                    root = queue.popleft()
                    if root == len(word):
                        words_dict[word] = True
                        return True
                    for child in range(root+1, len(word)+1):
                        if child in seen:
                            continue
                        if word[root:child] in words_dict:
                            seen[child] = True
                            queue.append(child)
            words_dict[word] = True
            return False
       
        for word in words:
            if bfs(word):
                output.append(word)
        return output

Problem Explanation


These solutions are nearly the same as the Dynamic Programming and Breadth-First Search solutions from Word Break.

The only difference is we will be performing these searches on each word, excluding the current word, instead of trying to find if a single word can be built from the others.



Dynamic Programming

If we are given two inputs and we are asked to find if those two inputs can be correlated, what the minimum amount of iterations it takes for one to be the other, or what amount of one exists in the other, things like that, typically that is an indication that Dynamic Programming can be applied.

That is the approach we will start with.

More specifically, we can build an array that we will use as a lookup table.  We will split the input string into subproblems of length one, two, three, etc.  Within each subproblem, we will be asking "is this substring a word, and have we answered true the previous times we have asked this question?", if both of these conditions are true then we will expand our length by one character and repeat.

If every value in our lookup table is true, then the answer is true.


Let's start by creating an output array and dictionary for the words we are given and then pushing each word into it so that we can make these lookups in constant time.

        output, words_dict = [], {}
        for word in words:
            words_dict[word] = True


Next, we will create our dp function that will return if a given word is a concatenation of other words.

        def dp(word:str) -> bool:

 

If the given word is null, we will return false because it can't be comprised of any other words.

            if not word:
                return False

 

Otherwise, we'll want to start off by excluding the current word from our search, or else this function is going to return true for every word by looking itself up.

            del words_dict[word]

 

Next, we will create our dp array that will hold the answers for each subproblem that we will refer to.

The base case of dp[0] will be if we received a string of length zero, and this case would always be true.

            dp = [True] + [False] * (len(word))

 

We will then create our "expansion" loop, with the iterator and this loop will traverse to the last subproblem of the dp array.  This is the loop that we will use to expand the range of our subproblem if each of the previous subproblems were true.

            for i in range(1, len(dp)):

 

Next, we will create our "subproblem partition" loop with the iterator and this loop will only focus on the range between the first character of the string to the character i is at.  

This loop is where we break the overall problem into subproblems, notice that the range of the traversals for this loop will go from zero to one, zero to two, zero to three, etc.

                for j in range(i):

 

Here is the question we will ask in each subproblem.

When we asked this question in a previous iteration, did we say it was true?  Also, does the substring ranging from index j to index i exist within the list of words we are given?

If so, we will mark the end of this subproblem expansion true so that we can let our future selves know that we were good up until this point, then break from this subproblem.

                    if dp[j] and word[j:i] in words_dict:
                        dp[i] = True

 

Once we have finished tabulating words for the given input word, we will add the word back to the words dictionary and return the last index because that will let us know if all of our subproblems were true or not.

            words_dict[word] = True
            return dp[-1]


For each word in the words array, if we were able to find that the word was a concatenation through our DP function, we will append it to the output.

        for word in words:
            if dp(word):
                output.append(word)

 

Once we have finished gathering all concatenated words, we will return the output.

        return output

Breadth-First Search

It may be more intuitive to solve this problem with a Breadth-First Search.

If we were to visualize this input string as one directed graph:

l->e->e->t->c->o->d->e->

 

We would be able to take the subgraphs from the words list and see if they could be joined to make the complete graph:

l->e->e->t->   True

c->o->d->e-> True

l->e->e->t->c->o->d->e-> True


Let's start by creating an output array and dictionary for the words we are given and then pushing each word into it so that we can make these lookups in constant time.

        output, words_dict = [], {}
        for word in words:
            words_dict[word] = True


Now we'll make our BFS function.

This function will take a word input and will return whether it can be built from previous words.

        def bfs(word:str) -> bool:

 

If the given word is null, we will return false because it can't be comprised of any other words.

            if not word:
                return False

 

We are also going to want to make an internal list of indices that we have seen so that we don't revisit them.  We could make this nonlocal to the BFS function but we would need to make sure we wiped it after each word.

            seen = {}

 

We'll want to start off by excluding the current word from our search, or else this function is going to return true for every word by looking itself up.

            del words_dict[word]

 

Next, we will push the index of the first character into the queue.

            queue = collections.deque([0])

 

While there are still character nodes in the queue, we will continue searching for subgraphs that we can build.

            while queue:

 

We can start each iteration by calculating the width of the queue to know how many nodes are in the current layer, this isn't required but it is a helpful practice for BFS traversals to calculate time or distance traversed.

                width = len(queue)

 

Then, within the current layer, we will pop a candidate subgraph root from the queue and check if its index is directly after the length of the input string.  

Meaning for example, if we have "leetcode", "e", is at index seven.  If we pop off index eight, then that means we have found all of the subgraphs we needed to complete the overall graph.

If that is the case, we will add the word back to the word dictionary and return true.

                for _ in range(width):
                    root = queue.popleft()
                    if root == len(word):
                        words_dict[word] = True
                        return True

 

If we haven't, we will keep building the graph.

For each child node from the index after the root to the end of the string, if we haven't already visited the child's index then we will add it onto the subgraph until we see that we have built a subgraph word that exists within the words list we were given.

l->e->

l->e->e->

l->e->e->t-> True

 

Once we have built a word we need, we will mark the character that comes directly after the end of the word as seen and push it into the queue to visit it next.

l->e->e->t->c

                    for child in range(root+1, len(word)+1):
                        if child in seen:
                            continue
                        if word[root:child] in words_dict:
                            seen[child] = True
                            queue.append(child)

 

One thing to note about the code is Python string slicing excludes the ending range value so to get the "leet" string slice, the ending range would have to be four, remember strings and arrays begin at index zero.  This would also be the index of the "c" in "leetcode", which is why the child index value is shared between these two lines of code.  

If we never returned true, then that means we kept building a subgraph with the remaining character nodes of the input string and that subgraph didn't exist within the list of words so we will add the word back to the word dictionary and return false.

            words_dict[word] = True
            return False


For each word in the words array, if we were able to find that the word was a concatenation through our BFS function, we will append it to the output.

        for word in words:
            if bfs(word):
                output.append(word)

 

Once we have finished gathering all concatenated words, we will return the output.

        return output