Find Median from Data Stream

Patrick Leaton
Problem Description

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.

 

Example 1:

Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]

Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

 

Constraints:

  • -10^5 <= num <= 10^5
  • There will be at least one element in the data structure before calling findMedian.
  • At most 5 * 10^4 calls will be made to addNum and findMedian.

 

The description was taken from https://leetcode.com/problems/find-median-from-data-stream/.

Problem Solution

#O(Log N) Time, O(N) Space
class MedianFinder:
    def __init__(self):
        self.left, self.right = [], []
 
    def addNum(self, num: int) -> None:
        heapq.heappush(self.left, -num)

 
        if self.left and self.right and -self.left[0] > self.right[0]:
            heapq.heappush(self.right, -heapq.heappop(self.left))

 
        if len(self.left) > len(self.right) + 1:
            heapq.heappush(self.right, -heapq.heappop(self.left))

 
        if len(self.left) + 1 < len(self.right):
            heapq.heappush(self.left, -heapq.heappop(self.right))
 
    def findMedian(self) -> float:
        if len(self.left) > len(self.right):
            return -self.left[0]

 
        if len(self.right) > len(self.left):
            return self.right[0]

 
        return (-self.left[0] + self.right[0]) / 2

 

Problem Explanation


The most obvious way to solve this would be to push all of the input values into a list, sort it, and return the middle value whenever the findMedian function is called.

What we can do instead is break that list into two sorted lists, one in descending order and one in ascending order, so that the median value will be at the end of the left list, the beginning of the second list, or the mean between the two values at the end of both lists.

We can use two heaps to fulfill this intuition, a max-heap, and a min-heap.  We will push the values into the heaps, rebalance if one heap's length is two sizes greater than the other or if one has values the other should have, and at the end, we will be able to return the median in constant time.


We will start by initializing both heaps in the class constructor, the left and right heap.

    def __init__(self):
        self.left, self.right = [], []


Next, we will make our insertion function.

    def addNum(self, num: int) -> None:
 

This function will by default push the value into the left heap, the max-heap, which will sort the values in ascending order.  We could choose whichever heap to push the value into because we are going to rebalance them later in this function.

In Python, we create make-shift max-heaps from min-heaps by making all the values negative so that the greatest value will be the smallest value when negative, and will be the root of the heap.  Any time we insert or pull from this left heap, we need to make sure we negate the value.

        heapq.heappush(self.left, -num)

 

First, we will rebalance based on values.  The left heap is supposed to have a maximum value that isn't greater than any values in the right one. 

                         2          3
                      /                \
                   1                    4

 

This way, we can get something like this to find the medians, [1,2,3,4]

If that case is broken, we would have something like this.

           5                               3
         /    \                          /
       1      2                     4
 
                   [1,2,5,3,4]
 
 
To avoid this, we will simply check the top value of the left heap and if it is greater than the top value of the right, we will put it in the right heap instead.
 
           2                              3
         /                              /     \
       1                            4        5
 
                   [1,2,3,4,5]

 

    if self.left and self.right and -self.left[0] > self.right[0]:
            heapq.heappush(self.right, -heapq.heappop(self.left))

 

Similarly, if one heap size is greater than the other by more than one, it is going to skew our median left or right by one index. 

If this is the case, we will just take from the bigger heap and give to the smaller one.

        if len(self.left) > len(self.right) + 1:
            heapq.heappush(self.right, -heapq.heappop(self.left))

 

        if len(self.left) + 1 < len(self.right):
            heapq.heappush(self.left, -heapq.heappop(self.right))


The findMedian function is pretty straightforward now.

The median will be the root of the bigger heap if there is an odd number of inputs.

        if len(self.left) > len(self.right):
            return -self.left[0]


        if len(self.right) > len(self.left):
            return self.right[0]

 

Otherwise, it will be the mean between both roots.

        return (-self.left[0] + self.right[0]) / 2