Find Duplicate Subtrees

Patrick Leaton
Problem Description

Given the root of a binary tree, return all duplicate subtrees.

For each kind of duplicate subtrees, you only need to return the root node of any one of them.

Two trees are duplicate if they have the same structure with the same node values.

 

Example 1:

Input: root = [1,2,3,4,null,2,4,null,null,4]
Output: [[2,4],[4]]

Example 2:

Input: root = [2,1,1]
Output: [[1]]

Example 3:

Input: root = [2,2,2,3,null,3,null]
Output: [[2,3],[3]]

 

Constraints:

  • The number of the nodes in the tree will be in the range [1, 10^4]
  • -200 <= Node.val <= 200

 

The description was taken from https://leetcode.com/problems/find-duplicate-subtrees/.

Problem Solution

#O(N) Time, O(N) Space
class Solution:
    def findDuplicateSubtrees(self, root: Optional[TreeNode]) -> List[Optional[TreeNode]]:
        seen, output = {}, []
       
        def postorder(node:TreeNode) -> str:
            if not node:
                return ""
            left = postorder(node.left)
            right = postorder(node.right)
            path = str(node.val) + "l" + left + "r" + right
            if path in seen:
                seen[path] = node
            else:
                seen[path] = 0
            return path
       
        postorder(root)
       
        for node in seen:
            if seen[node]:
                output.append(seen[node])
               
        return output

Problem Explanation


Perhaps a brute force way to solve this would be to traverse the tree once and group together every duplicate tree node within a HashMap.  Then from there, recursively check if each duplicate node is the same subtree as each other node using a recursive checker function like in the problem Same Tree.

That would require a quadratic time complexity, however.

What we can do instead is perform a postorder traversal and serialize the Depth-First Search path taken from each child node up to the current node.  What this means is we could convert each node value within the DFS path to a string so that we may hash these paths and find duplicates that way within one traversal.

A postorder traversal seems to be our best bet here because if for example, we tried a preorder where we visited each node before their children, then we may find a duplicate top subtree but the bottom subtree may have additional, different, or fewer nodes which wouldn't make it actually a duplicate subtree.   Through a postorder traversal where we visit each node after its children, we can be sure that the entire subtree is the same up to the subtree root, and not just the top.

The main trick we will utilize here is if we find a duplicate path within our HashMap then we will store the subtree root node within the hashmap too, mapped to its path.  If it is our first time seeing a path then we will just store it with a zero value mapped to it.  This way, at the end of our traversal, all we will have to do is return the non-zero values from the hashmap.


 Let's start by initializing our HashMap and our output list.

        seen, output = {}, []

 

Next, we will make our postorder traversal which will be returning a string within each function call.

        def postorder(node:TreeNode) -> str:

 

If the current node is null, then we will return an empty string.

            if not node:
                return ""

 

Otherwise, we will continue the traversal on both the left and right children of the current node.

            left = postorder(node.left)
            right = postorder(node.right)

 

When the DFS yoyo has returned, we will create a path string of the current node concatenated with the left and right strings.

It is important here to use an identifier in the scenario that the "duplicate" subtrees are actually mirrored.   We will need to be able to map serialized paths with child node sides.

            path = str(node.val) + "l" + left + "r" + right

 

If the current path has up to the current node has been seen before, then we have a duplicate subtree so we will store the node within the HashMap.

            if path in seen:
                seen[path] = node

 

Otherwise, we will initialize the path within our dictionary with a value of zero.

            else:
                seen[path] = 0

 

After we have done that, we will return the path so that it may be concatenated in the current node's parent's function call.

            return path

 

Now that our helper function is built, we just need to make the initial call on the root node, look for any non-zero values within our hashmap, add those to the output then return them.

        postorder(root)
       
        for node in seen:
            if seen[node]:
                output.append(seen[node])
               
        return output