Given a stream of numbers, return the median after each new value. The data structure to know: two heaps balanced against each other.
The setup
- A max-heap holds the lower half of the values
- A min-heap holds the upper half
- Invariant: max-heap's size equals min-heap's size, or is exactly one larger
Inserting a new value
If the new value is less than or equal to the max-heap's top, push to the max-heap; otherwise push to the min-heap. Then rebalance: if one heap is more than one larger than the other, pop from the bigger and push to the smaller.
Reading the median
If the sizes are equal, the median is the average of the two heaps' tops. If the max-heap is bigger, its top is the median (odd-sized stream).
Complexity
Each insert is (heap push plus possible rebalance). Median lookup is — just read the heap tops. For a stream of values, total time is .
Why two heaps work
Together, they partition the data around the median. We don't need the data sorted — just the two extreme values that straddle the middle. Heaps give us cheap access to exactly those, in exchange for not caring about the order of the rest.
Variations
Sliding-window median is trickier because you also need to evict outgoing elements, which requires a deletion-supporting structure (an indexed heap or a balanced BST / multiset). For a non-sliding running median, two heaps suffice.