K-Means Clustering — Section 7: Unsupervised Learning

K-means clusters points into K groups by alternating between assigning points to the nearest cluster center and recomputing centers as the mean of their assigned points. The most-used clustering algorithm — fast, simple, and good when clusters are roughly spherical.

The algorithm

1. Initialize K cluster centers (randomly or with K-means++) 2. Assign each point to its nearest center 3. Update each center to be the mean of its assigned points 4. Repeat 2-3 until assignments stop changing

Converges to a local optimum in $O(I \cdot K \cdot N \cdot D)$ for $I$ iterations.

K-means++

Random initialization can give bad clusterings. K-means++ picks initial centers more carefully: start with one random point, then iteratively pick the next center proportional to its squared distance from the nearest existing center. Provably $O(\log K)$ -competitive with the global optimum.

Choosing K

The hard problem. Methods:

Elbow method: plot within-cluster sum of squares vs K; look for an "elbow" where the marginal gain flattens
Silhouette score: measures how similar each point is to its cluster vs neighboring clusters; higher is better
Gap statistic: compares within-cluster dispersion to that expected under a null distribution

In practice, K-means is often used as part of a larger pipeline where K is dictated by downstream needs.

Assumptions and failure modes

K-means assumes clusters are roughly spherical, similar size, and similar density. Fails on:

Non-convex shapes (concentric rings, half-moons)
Very different cluster sizes
Strong noise / outliers (mean is sensitive)
High-dimensional data (Euclidean distance becomes uninformative)

For these, use DBSCAN, spectral clustering, or Gaussian mixtures.

Standardize features

K-means uses Euclidean distance, so features with larger scales dominate. Always standardize (or domain-normalize) before clustering.