Alien Dictionary

Patrick Leaton
Problem Description

There is a new alien language that uses the English alphabet. However, the order among the letters is unknown to you.

You are given a list of strings words from the alien language's dictionary, where the strings in words are sorted lexicographically by the rules of this new language.

Return a string of the unique letters in the new alien language sorted in lexicographically increasing order by the new language's rules. If there is no solution, return "". If there are multiple solutions, return any of them.

A string s is lexicographically smaller than a string t if at the first letter where they differ, the letter in s comes before the letter in t in the alien language. If the first min(s.length, t.length) letters are the same, then s is smaller if and only if s.length < t.length.

 

Example 1:

Input: words = ["wrt","wrf","er","ett","rftt"]
Output: "wertf"

Example 2:

Input: words = ["z","x"]
Output: "zx"

Example 3:

Input: words = ["z","x","z"]
Output: ""
Explanation: The order is invalid, so return "".

 

Constraints:

  • 1 <= words.length <= 100
  • 1 <= words[i].length <= 100
  • words[i] consists of only lowercase English letters.

 

The description was taken from https://leetcode.com/problems/alien-dictionary/.

Problem Solution

#O(C) Time, O(1) Space where C is number of total characters.
class Solution:
    def alienOrder(self, words: List[str]) -> str:
        adjacency, to_pointers = {}, {}
        queue = collections.deque()
        output = []
       
        for word in words:
            for char in word:
                adjacency[char] = set()
                to_pointers[char] = 0
       
        for i in range(0, len(words)-1):
            word_one = words[i]
            word_two = words[i+1]
            min_len = min(len(word_one), len(word_two))
            difference = False
            for char in range(min_len):
                if word_one[char] != word_two[char]:
                    difference = True
                    if word_two[char] not in adjacency[word_one[char]]:
                        adjacency[word_one[char]].add(word_two[char])
                        to_pointers[word_two[char]] += 1
                    break
            if not difference and len(word_one) > len(word_two):
                return ""
       
        for char in to_pointers:
            if not to_pointers[char]:
                queue.append(char)
       
        while queue:
            width = len(queue)
            for _ in range(width):
                char = queue.popleft()
                output.append(char)
                for neighbor in adjacency[char]:
                    to_pointers[neighbor] -= 1
                    if to_pointers[neighbor] == 0:
                        queue.append(neighbor)
       
        if len(output) == len(adjacency):
            return "".join(output)
       
        return ""
                   

Problem Explanation


In the first version of this problem, Verifying an Alien Dictionary, we were given an ordering of letters we were told was an alien dictionary and our task was to return if it was valid.

Here, we are given a list of words and we are being asked to reverse-engineer an ordering from them.  

To do this, we will need to map which characters came before other ones, so we can get some kind of indication of an ordering.  If we were trying to tell that from two words, we wouldn't know the result until we found a different character within the two words.

From "wrt" and "wrf", we don't know the ordering of "w" and "r" because they are ordered within the same indices in both strings, but we can tell that "t" comes before "f" since they are different and "wrt" comes before "wrf", so we can note that.

w -> r

From "wrf" and "er", we are now given the context that "w" comes before "e", so we can note that.

w -> e

We can solve this problem by traversing the list of words, two words at a time, and making these mappings.  Then, we will search each character that came first in these individual mappings to build a dictionary that way, using a Breadth-First Search.  This algorithm of searching childless or parentless nodes is also known as topological sorting.


Let's start by initializing our adjacency list for the mapping of each prerequisite character to its next character, our to_pointers list which will contain each character that has a prerequisite character, and our queue for our BFS.

        adjacency, to_pointers = {}, {}
        queue = collections.deque()

 

We'll also need an output list to place the alien dictionary character ordering into.

        output = []

 

For each character in the word list, we will initialize their place within our adjacency and to_pointers lists.

        for word in words:
            for char in word:
                adjacency[char] = set()
                to_pointers[char] = 0

 

Now, we'll make our traversal through the words list.

We have two goals here: to build character order mappings and to validate that the input was sufficient enough to do this, the second goal being the same as the previous variation of this problem.

        for i in range(0, len(words)-1):

 

Within each index, we will take two words, the current word, and the word directly thereafter.

            word_one = words[i]
            word_two = words[i+1]

 

We'll traverse the characters in the shorter word so that we don't get an index out-of-bounds error.

            min_len = min(len(word_one), len(word_two))

 

Here, we will also want a flag to let us know if we found a difference which will be explained more in-depth later.

            difference = False

 

Let's start traversing the characters of these two words until we found a difference that will indicate which character in the first word comes before the character in the second word within the alien dictionary.

            for char in range(min_len):

 

If we find a difference, then we will flip our flag to true, create a pointer to the second word's character from the first character within our adjacency list, and increment the number of pointers the second character has by one.

One thing to note is that we won't want to increment this value if we already made this connection between the two characters because we only want unique mappings between characters or else, later on, we will think that more characters are pointing to the second character than they really are so we won't be able to make the ordering appropriately.

            for char in range(min_len):
                if word_one[char] != word_two[char]:
                    difference = True
                    if word_two[char] not in adjacency[word_one[char]]:
                        adjacency[word_one[char]].add(word_two[char])
                        to_pointers[word_two[char]] += 1

 

Once we find the first difference, we will break because similar to our human English dictionary, the sorting of the first different character is what will identify that two words are sorted.  We can verify "amazon" and "apple" are sorted since "m" comes before "p", and this also isolates the mapping of m->p.

Once we find that first difference, we will break and begin comparing the characters in word two with the word that comes after it.

                    break

 

If we have gone through all the characters in the smaller word and have not found a difference between it and the characters in the larger word, we will check the length of the words.  If the length of the first word is larger than the second word, that means the second word has null characters that are lexicographically lesser than the remaining characters in the first word and makes it unsorted so we aren't given a sufficient list of words to build our alien dictionary off of and will return nothing.

            if not difference and len(word_one) > len(word_two):
                return ""

 

Now that we have our mappings built between the characters we were able to find an order for, the characters without other characters pointing to them should be visited first.

The words in the first example from the problem description will give us:

w->e

r->t

t->f

f->none

e->r

 

If we combined the pointers, it would be:

w->e->r->t->f->none

 

"W" has no pointers going to it, so we will visit it within the first layer of our BFS.

        for char in to_pointers:
            if not to_pointers[char]:
                queue.append(char)

 

Then, while we have character nodes in our queue, we will pop a character off, append it to the output, and for each node the current node is pointing to, we will decrement their to_pointer count to signify they now have one less prerequisite character after removing this current character from our search.

        while queue:
            width = len(queue)
            for _ in range(width):
                char = queue.popleft()
                output.append(char)
                for neighbor in adjacency[char]:
                    to_pointers[neighbor] -= 1

 

If they now have no characters pointing to them afterward, we will visit them next.

                    if to_pointers[neighbor] == 0:
                        queue.append(neighbor)

 

If we were able to create an ordering equal to the number of characters we have, we will return it.

         if len(output) == len(adjacency):
            return "".join(output)

 

Otherwise, our input had ordering contradictions or was otherwise an insufficient input for us to create a valid order mapping from.

         return ""