Lesson 22 of 47 3 min

MANG Problem #34: Find Median from Data Stream (Hard)

Master the "Two Heaps" pattern to track the running median of a data stream in O(log n) time.

1. Problem Statement

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far.

2. Approach: Max-Heap and Min-Heap

graph LR
    subgraph "Small Half (Max-Heap)"
        Max[Top: Largest of Small]
    end
    
    subgraph "Large Half (Min-Heap)"
        Min[Top: Smallest of Large]
    end
    
    Median{Median?}
    Max --- Median --- Min

Sorting the array on every addNum takes $O(n \log n)$. We need to do it faster.

  1. The Two Halves: Split the numbers into a "Small" half and a "Large" half.
    • The "Small" half is represented by a Max-Heap (so we can get the largest of the smalls in $O(1)$).
    • The "Large" half is represented by a Min-Heap (so we can get the smallest of the larges in $O(1)$).
  2. Balancing Act: The sizes of the two heaps can differ by at most 1.
  3. Median:
    • If sizes are unequal, the median is the top of the larger heap.
    • If sizes are equal, the median is the average of the tops.

3. Java Implementation

class MedianFinder {
    private PriorityQueue<Integer> small; // Max-Heap
    private PriorityQueue<Integer> large; // Min-Heap

    public MedianFinder() {
        small = new PriorityQueue<>(Collections.reverseOrder());
        large = new PriorityQueue<>();
    }
    
    public void addNum(int num) {
        // 1. Add to small, then push the max of small to large
        small.offer(num);
        large.offer(small.poll());
        
        // 2. Balance: ensure small is always >= large in size
        if (small.size() < large.size()) {
            small.offer(large.poll());
        }
    }
    
    public double findMedian() {
        if (small.size() > large.size()) {
            return small.peek();
        }
        return (small.peek() + large.peek()) / 2.0;
    }
}

4. 5-Minute "Video-Style" Walkthrough

  1. The "Aha!" Moment: We don't care about the exact sorted order of all numbers. We only care about the elements near the middle. Heaps are perfect for this.
  2. The Transfer Step: When adding a number, we can't just check the sizes. What if we add a huge number to the small heap? It would break the invariant. So, we always add to small, then pop the max from small and give it to large. This guarantees the "small vs large" separation.
  3. The Re-balance: After the transfer, large might be bigger than small. We simply pop from large back to small to restore balance.

5. Interview Discussion

  • Interviewer: "What is the time complexity?"
  • You: "O(log N) for addNum because we do at most 3 heap operations. findMedian is O(1)."
  • Interviewer: "What if the stream only contains numbers from 0 to 100?"
  • You: "We can optimize space and time using a Counting Sort array (size 101). Tracking the median becomes an O(100) -> O(1) scan of the frequencies."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.