Union-Find (Disjoint Set Union) — Section 8: Graphs

Union-find maintains a partition of $n$ elements into disjoint sets. It supports two operations: find (which set is $x$ in?) and union (merge the sets containing $x$ and $y$ ). With the right tricks, both are $O(\alpha(n))$ — practically constant.

The basic structure

Each element points to a "parent." Sets are trees, identified by their root. `find(x)` walks up to the root. `union(x, y)` finds both roots and points one at the other.

Path compression

During `find(x)`, after walking up to the root, redirect every node on the path to point directly at the root. Future finds from those nodes are $O(1)$ .

Union by rank (or size)

When merging, attach the shorter tree to the taller (or the smaller set to the larger). Keeps tree heights logarithmic.

Why $O(\alpha(n))$

Together, path compression and union-by-rank give an amortized cost per operation of $O(\alpha(n))$ , where $\alpha$ is the inverse Ackermann function. For any practical $n$ (up to $10^{80}$ or so), $\alpha(n) \leq 4$ . Effectively constant time.

Applications

Kruskal's MST: sort edges by weight; for each edge, if its two endpoints are in different sets, take the edge and union the sets.
Connected components on the fly: as edges are added to a graph, maintain components in $O(\alpha)$ per edge.
Offline queries: many "after these operations, is X connected to Y" problems can be answered in $O((Q + E) \alpha(n))$ by processing in the right order.
Cycle detection: when adding an edge, if both endpoints are already in the same set, you'd create a cycle.

What it CAN'T do

Splits — once you've merged two sets, the union-find structure can't separate them again. If you need both merging and splitting, you need a different data structure (link-cut trees).