Index Pairs of a String

Patrick Leaton
Problem Description

Given a text string and words (a list of strings), return all index pairs [i, j] so that the substring text[i]...text[j] is in the list of words.

 

Example 1:

Input: text = "thestoryofleetcodeandme", words = ["story","fleet","leetcode"]
Output: [[3,7],[9,13],[10,17]]

Example 2:

Input: text = "ababa", words = ["aba","ab"]
Output: [[0,1],[0,2],[2,3],[2,4]]
Explanation: 
Notice that matches can overlap, see "aba" is found in [0,2] and [2,4].

 

Note:

  1. All strings contains only lowercase English letters.
  2. It's guaranteed that all strings in words are different.
  3. 1 <= text.length <= 100
  4. 1 <= words.length <= 20
  5. 1 <= words[i].length <= 50
  6. Return the pairs [i,j] in sorted order (i.e. sort them by their first coordinate in case of ties sort them by their second coordinate).

 

The description was taken from https://leetcode.com/problems/index-pairs-of-a-string/.

Problem Solution

#O(N) Time, O(M) Space where M is Length of Keys
class Solution:
    def indexPairs(self, text: str, words: List[str]) -> List[List[int]]:
        Trie = lambda: collections.defaultdict(Trie)
        trie = Trie()
        is_end = True
        result = []
       
        for i, word in enumerate(words):
            curr_node = trie
            for char in word:
                curr_node = curr_node[char]
            curr_node[is_end] = i
        
        for i in range(0,len(text)):
            curr_node = trie
            for j in range(i, len(text)):
                if curr_node[text[j]] == {}:
                    break
                curr_node = curr_node[text[j]]
                if is_end in curr_node:
                    result.append((i, j))
        return result

Problem Explanation


We can use a Trie to solve this problem.

Whenever we are asked to traverse word prefixes or substrings, a Trie is typically a go-to approach so that we'll be able to traverse common prefixes in linear time instead of quadratic.

We'll build a Trie using the words given and then we will traverse the Trie with two pointers so that we can find the range of the substrings that are in the Trie.


In Python, we can use a default dictionary to implement our Trie.  We will define it and instantiate it here.

        Trie = lambda: collections.defaultdict(Trie)
        trie = Trie()

 

We will also have a boolean variable, is_end.  We will use this to create nodes within our Trie that will signify a stopping point in traversals and also indicate complete words.

        is_end = True

 

Let's also initialize our result array.

        result = []

 

We will then enumerate through the words to traverse each word while also having a counter for the indices.  

Let's also create a current node pointer and set it to the root of the Trie.

        for i, word in enumerate(words):
            curr_node = trie

 

As we are traversing the words list, we will also want to traverse the characters within each word.  We will do this by using an inner loop.

For each character in the word we are traversing, we will push the character into our Trie and iterate to it.

            for char in word:
                curr_node = curr_node[char]

 

Once we have pushed every character into the Trie, we will add an ending node set to the index of the current word from our input list.

            curr_node[is_end] = i

 

Next, we will traverse the array and set the current node pointer back to the root of the trie.

        for i in range(0,len(text)):
            curr_node = trie

 

This outer loop will give us the starting index of each word within the string as we traverse.

We will need another loop to give us the ending index.

            for j in range(i, len(text)):

 

During our traversal, if the node of the character we're currently looking at is empty, then we don't have this character within our Trie so we will break.

                if curr_node[text[j]] == {}:
                    break

 

If we do have the character, we will iterate to the node.

                curr_node = curr_node[text[j]]

 

If we have an ending node as a child, we have a complete word so we will append the index pair of i and j to our result.

                if is_end in curr_node:
                    result.append((i, j))

 

Once we have completed our traversal and appended each valid word string index pair, we will return the result.

        return result