1. Problem Statement
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
Implement the MedianFinder class:
MedianFinder()initializes theMedianFinderobject.void addNum(int num)adds the integernumfrom the data stream to the data structure.double findMedian()returns the median of all elements so far.
2. Approach: Max-Heap and Min-Heap
graph LR
subgraph "Small Half (Max-Heap)"
Max[Top: Largest of Small]
end
subgraph "Large Half (Min-Heap)"
Min[Top: Smallest of Large]
end
Median{Median?}
Max --- Median --- Min
Sorting the array on every addNum takes $O(n \log n)$. We need to do it faster.
- The Two Halves: Split the numbers into a "Small" half and a "Large" half.
- The "Small" half is represented by a Max-Heap (so we can get the largest of the smalls in $O(1)$).
- The "Large" half is represented by a Min-Heap (so we can get the smallest of the larges in $O(1)$).
- Balancing Act: The sizes of the two heaps can differ by at most 1.
- Median:
- If sizes are unequal, the median is the top of the larger heap.
- If sizes are equal, the median is the average of the tops.
3. Java Implementation
class MedianFinder {
private PriorityQueue<Integer> small; // Max-Heap
private PriorityQueue<Integer> large; // Min-Heap
public MedianFinder() {
small = new PriorityQueue<>(Collections.reverseOrder());
large = new PriorityQueue<>();
}
public void addNum(int num) {
// 1. Add to small, then push the max of small to large
small.offer(num);
large.offer(small.poll());
// 2. Balance: ensure small is always >= large in size
if (small.size() < large.size()) {
small.offer(large.poll());
}
}
public double findMedian() {
if (small.size() > large.size()) {
return small.peek();
}
return (small.peek() + large.peek()) / 2.0;
}
}
4. 5-Minute "Video-Style" Walkthrough
- The "Aha!" Moment: We don't care about the exact sorted order of all numbers. We only care about the elements near the middle. Heaps are perfect for this.
- The Transfer Step: When adding a number, we can't just check the sizes. What if we add a huge number to the
smallheap? It would break the invariant. So, we always add tosmall, then pop the max fromsmalland give it tolarge. This guarantees the "small vs large" separation. - The Re-balance: After the transfer,
largemight be bigger thansmall. We simply pop fromlargeback tosmallto restore balance.
5. Interview Discussion
- Interviewer: "What is the time complexity?"
- You: "O(log N) for
addNumbecause we do at most 3 heap operations.findMedianis O(1)." - Interviewer: "What if the stream only contains numbers from 0 to 100?"
- You: "We can optimize space and time using a Counting Sort array (size 101). Tracking the median becomes an O(100) -> O(1) scan of the frequencies."