Union-find maintains a partition of elements into disjoint sets. It supports two operations: find (which set is in?) and union (merge the sets containing and ). With the right tricks, both are — practically constant.
The basic structure
Each element points to a "parent." Sets are trees, identified by their root. `find(x)` walks up to the root. `union(x, y)` finds both roots and points one at the other.
Path compression
During `find(x)`, after walking up to the root, redirect every node on the path to point directly at the root. Future finds from those nodes are .
Union by rank (or size)
When merging, attach the shorter tree to the taller (or the smaller set to the larger). Keeps tree heights logarithmic.
Why $O(\alpha(n))$
Together, path compression and union-by-rank give an amortized cost per operation of , where is the inverse Ackermann function. For any practical (up to or so), . Effectively constant time.
Applications
- Kruskal's MST: sort edges by weight; for each edge, if its two endpoints are in different sets, take the edge and union the sets.
- Connected components on the fly: as edges are added to a graph, maintain components in per edge.
- Offline queries: many "after these operations, is X connected to Y" problems can be answered in by processing in the right order.
- Cycle detection: when adding an edge, if both endpoints are already in the same set, you'd create a cycle.
What it CAN'T do
Splits — once you've merged two sets, the union-find structure can't separate them again. If you need both merging and splitting, you need a different data structure (link-cut trees).