Union-find (disjoint set)
Union-find (also called disjoint set union, DSU) maintains a partition of vertices into connected components under two fast operations: find (which component?) and union (merge components). It is foundational for Kruskal's MST, offline connectivity queries, and cycle checks in incremental undirected graphs.
- Parent-forest representation
find,union, and connected query- Path compression + union by rank
- C implementation and trace
1) DSU model
Each set is represented by a rooted tree. Array parent[x] stores a node's parent; a root is its own parent. find(x) returns the root representative. union(a,b) links one root under the other if they differ.
Initially: 0 1 2 3 4 5 (each is its own set) After union(0,1), union(1,2): 0<-1<-2 and 3 4 5 Representative(find(2)) = 0
2) Operations
| Operation | Meaning | Typical use |
|---|---|---|
make_set(x) | Create singleton set for x. | Initialization for all vertices. |
find(x) | Return component representative. | Check whether two vertices are already connected. |
union(a,b) | Merge sets containing a and b. | Add new edge between components. |
same(a,b) | find(a)==find(b)? | Cycle check in undirected edge insertion. |
3) Why it is fast
- Path compression: after
find(x), every visited node points directly to the root. - Union by rank/size: attach the smaller (or shallower) root under the larger root.
- Together they give amortized O(α(V)) per operation (inverse Ackermann, practically constant).
4) C — DSU with path compression + rank
#include <stdio.h> #define N 6 static int parent[N]; static int rnk[N]; void dsu_init(void) { for (int i = 0; i < N; i++) { parent[i] = i; /* singleton set: root is itself */ rnk[i] = 0; } } int dsu_find(int x) { if (parent[x] != x) parent[x] = dsu_find(parent[x]); /* path compression */ return parent[x]; } void dsu_union(int a, int b) { a = dsu_find(a); b = dsu_find(b); if (a == b) return; /* already same component */ /* union by rank */ if (rnk[a] < rnk[b]) { int t = a; a = b; b = t; } parent[b] = a; if (rnk[a] == rnk[b]) rnk[a]++; } int same_set(int a, int b) { return dsu_find(a) == dsu_find(b); } int main(void) { dsu_init(); dsu_union(0, 1); dsu_union(1, 2); dsu_union(3, 4); printf("same(0,2) = %d\n", same_set(0, 2)); /* 1 */ printf("same(2,4) = %d\n", same_set(2, 4)); /* 0 */ dsu_union(2, 4); printf("same(0,4) = %d\n", same_set(0, 4)); /* 1 */ return 0; }
5) Trace — component merges
| Step | Operation | Components (informal) |
|---|---|---|
| Init | make_set(0..5) | {0} {1} {2} {3} {4} {5} |
| 1 | union(0,1) | {0,1} {2} {3} {4} {5} |
| 2 | union(1,2) | {0,1,2} {3} {4} {5} |
| 3 | union(3,4) | {0,1,2} {3,4} {5} |
| 4 | same(2,4) | false (different reps) |
| 5 | union(2,4) | {0,1,2,3,4} {5} |
6) Graph uses
- Kruskal MST: include edge only if endpoints are in different sets.
- Cycle detection (undirected): adding edge
(u,v)forms a cycle ifsame(u,v)is already true. - Offline connectivity: process unions, answer whether two nodes are connected.
Key takeaway
Union-find is a tiny but powerful structure: two arrays plus two optimizations give near-constant component operations. It is the engine behind Kruskal's algorithm and many connectivity problems.
Related: Kruskal's algorithm · Minimum spanning tree · Cycle detection · Graph representation