Collision Resolution — Section 3: Hash Tables

Even with a good hash function, two keys can map to the same bucket. Hash tables handle these collisions with one of two strategies.

Chaining

Each bucket holds a linked list (or dynamic array) of all elements that hashed there. Lookup walks the chain. Average chain length is $\alpha$ , so expected work per operation is $O(1 + \alpha)$ . Used by Java's HashMap, Python's dict (with optimizations), and most introductory texts.

Pros: simple, gracefully handles high load factor. Cons: cache-unfriendly (chains live on the heap), per-element memory overhead from list nodes.

Open addressing

All elements live directly in the bucket array. On collision, probe to another slot using a deterministic rule: linear (try $h(k), h(k)+1, h(k)+2, \dots$ ), quadratic (try $h(k), h(k)+1, h(k)+4, h(k)+9, \dots$ ), or double hashing (use a second hash function as the step size).

Pros: cache-friendly (contiguous memory), no per-element overhead. Cons: deletion is hairy (you need a tombstone marker so probes for other keys still work), and performance degrades sharply as load factor approaches 1. Most implementations cap $\alpha$ at 0.5 - 0.7.

Clustering

Linear probing suffers from "primary clustering" — long runs of consecutive filled buckets that grow disproportionately. Quadratic probing avoids primary clustering but has secondary clustering (keys that hash to the same bucket follow the same probe sequence). Double hashing avoids both at the cost of two hash computations.

Which to choose

Modern high-performance hash maps (Google's Swiss tables, Facebook's F14, Rust's hashbrown) use open addressing with sophisticated probing and SIMD-accelerated lookup. Chaining is simpler to reason about but pays a real-world cost in cache misses.

Even with a good hash function, two keys can map to the same bucket. Hash tables handle these collisions with one of two strategies.

Chaining

Pros: simple, gracefully handles high load factor. Cons: cache-unfriendly (chains live on the heap), per-element memory overhead from list nodes.