Word Break

Patrick Leaton
Problem Description

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

  • The same word in the dictionary may be reused multiple times in the segmentation.
  • You may assume the dictionary does not contain duplicate words.

Example 1:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
             Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false

 

The description was taken from https://leetcode.com/problems/word-break/.

Problem Solution

#DP
#O(N^3) Time, O(N) Space
class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        words = {}
        for word in wordDict:
            words[word] = True
           
        dp = [True] + [False] * (len(s))
        for i in range(1, len(dp)):
            for j in range(0, i):
                if dp[j] and s[j:i] in words:
                    dp[i] = True
                    break
        return dp[-1]

#BFS
#O(N^3) Time, O(N) Space
class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        words, seen = {}, {}
       
        for word in wordDict:
            words[word] = True
       
        def bfs() -> bool:
            queue = collections.deque([(0)])
            while queue:
                width = len(queue)
                for _ in range(width):
                    root = queue.popleft()
                    if root == len(s):
                        return True
                    for child in range(root+1, len(s)+1):
                        if child in seen:
                            continue
                        if s[root:child] in words:
                            seen[child] = True
                            queue.append(child)
            return False
                           
        return bfs()

Problem Explanation


Dynamic Programming

If we are given two inputs and we are asked to find if those two inputs can be correlated, what the minimum amount of iterations it takes for one to be the other, or what amount of one exists in the other, things like that, typically that is an indication that Dynamic Programming can be applied.

That is the approach we will start with.

More specifically, we can build an array that we will use as a lookup table.  We will split the input string into subproblems of length one, two, three, etc.  Within each subproblem, we will be asking "is this substring a word, and have we answered true the previous times we have asked this question?", if both of these conditions are true then we will expand our length by one character and repeat.

If every value in our lookup table is true, then the answer is true.


Let's start by creating a dictionary for the words we are given and then pushing each word into it so that we can make these lookups in constant time.

        words = {}
        for word in wordDict:
            words[word] = True

 

Next, we will create our dp array that will hold the answers for each subproblem that we will refer to.

The base case of dp[0] will be if we received a string of length zero, and this case would always be true.

        dp = [True] + [False] * (len(s))

 

We will then create our "expansion" loop, with the iterator i and this loop will traverse to the last subproblem of the dp array.  This is the loop that we will use to expand the range of our subproblem if each of the previous subproblems were true.

        for i in range(1, len(dp)):

 

Next, we will create our "subproblem partition" loop with the iterator and this loop will only focus on the range between the first character of the string to the character i is at.  

This loop is where we break the overall problem into subproblems, notice that the range of the traversals for this loop will go from zero to one, zero to two, zero to three, etc.

            for j in range(0, i):

 

Here is the question we will ask in each subproblem.

When we asked this question in a previous iteration, did we say it was true?  Also, does the substring ranging from index j to index i exist within the list of words we are given?

If so, we will mark the end of this subproblem expansion true so that we can let our future selves know that we were good up until this point, then break from this subproblem.

                if dp[j] and s[j:i] in words:
                    dp[i] = True
                    break

 

Once we are done with our traversals and filling the array, we will return the last index because that will let us know if all of our subproblems were true or not.

        return dp[-1]



Breadth-First Search

It may be more intuitive to solve this problem with a Breadth-First Search.

If we were to visualize this input string as one directed graph:

l->e->e->t->c->o->d->e->

 

We would be able to take the subgraphs from the words list and see if they could be joined to make the complete graph:

l->e->e->t->   True

c->o->d->e-> True

l->e->e->t->c->o->d->e-> True


Let's start by initializing our dictionary of words that we'll push every word from the list into for a constant lookup time, and also a dictionary for seen characters so that we don't revisit them.

        words, seen = {}, {}

 

Next, we'll go through the list of words and push each word into words dictionary.

        for word in wordDict:
            words[word] = True


Now we'll make our BFS function.

        def bfs() -> bool:

 

We'll start by pushing the index of the first character into the queue.
            queue = collections.deque([(0)])

 

While there are still character nodes in the queue, we will continue searching for subgraphs that we can build.

            while queue:

 

We can start each iteration by calculating the width of the queue to know how many nodes are in the current layer, this isn't required but it is a helpful practice for BFS traversals to calculate time or distance traversed.

                width = len(queue)

 

Then, within the current layer, we will pop a candidate subgraph root from the queue and check if its index is directly after the length of the input string.  

Meaning for example, if we have "leetcode", "e", is at index seven.  If we pop off index eight, then that means we have found all of the subgraphs we needed to complete the overall graph.

If that is the case, we will return true.

                for _ in range(width):
                    root = queue.popleft()
                    if root == len(s):
                        return True

 

If we haven't, we will keep building the graph.

For each child node from the index after the root to the end of the string, if we haven't already visited the child's index then we will add it onto the subgraph until we see that we have built a subgraph word that exists within the words list we were given.

l->e->
l->e->e->
l->e->e->t-> True

 

Once we have built a word we need, we will mark the character that comes directly after the end of the word as seen and push it into the queue to visit it next.

l->e->e->t->c

                    for child in range(root+1, len(s)+1):
                        if child in seen:
                            continue
                        if s[root:child] in words:
                            seen[child] = True
                            queue.append(child)

 

One thing to note about the code is Python string slicing excludes the ending range value so to get the "leet" string slice, the ending range would have to be four, remember strings and arrays begin at index zero.  This would also be the index of the "c" in "leetcode", which is why the child index value is shared between these two lines of code.  

If we never returned true, then that means we kept building a subgraph with the remaining character nodes of the input string and that subgraph didn't exist within the list of words so we will return false.

            return False


Now that we have our BFS function built, we just need to make the initial call.

        return bfs()