Principal Component Analysis — Section 12: Multivariate Statistics

Principal Component Analysis (PCA) decomposes a high-dimensional dataset into orthogonal directions ordered by how much variance each captures.

Given a centered data matrix $X$ , the principal components are the eigenvectors of the covariance matrix $\Sigma = X^\top X / (n - 1)$ , with eigenvalues equal to the variances along those directions:

\Sigma\, v_i = \lambda_i\, v_i, \quad \lambda_1 \ge \lambda_2 \ge \dots \ge 0

The first principal component points in the direction of maximum variance. The second is orthogonal to the first and captures the next-most variance, and so on.

PCA is everywhere in finance. The first few components of a global stock-return matrix reveal the systematic factors driving the market — the first usually looks like "the overall market," the second like "growth vs value," and so on. PCA-based factor models are foundational for risk management.

Watch out: PCA is a linear method, sensitive to the scale of variables, and the components have no inherent interpretability. If two variables have very different units, scale them first.

Principal Component Analysis (PCA) decomposes a high-dimensional dataset into orthogonal directions ordered by how much variance each captures.

\Sigma\, v_i = \lambda_i\, v_i, \quad \lambda_1 \ge \lambda_2 \ge \dots \ge 0

The first principal component points in the direction of maximum variance. The second is orthogonal to the first and captures the next-most variance, and so on.

Watch out: PCA is a linear method, sensitive to the scale of variables, and the components have no inherent interpretability. If two variables have very different units, scale them first.