Regularization — Section 11: Regression Analysis

Regularization adds a penalty term to the OLS loss to discourage large coefficients, trading bias for variance.

Ridge regression uses an $L_2$ penalty:

\min_\beta \|y - X\beta\|^2 + \lambda \|\beta\|_2^2

It shrinks all coefficients toward zero and has a closed-form solution. Especially useful when predictors are correlated.

Lasso uses an $L_1$ penalty:

\min_\beta \|y - X\beta\|^2 + \lambda \|\beta\|_1

Lasso drives some coefficients to exactly zero, doubling as variable selection. The optimization is convex but lacks a closed form (use coordinate descent).

Elastic Net combines $L_1$ and $L_2$ penalties — better than Lasso when groups of correlated predictors should all enter or none.

Regularization trades a small bias for a much larger reduction in variance, which usually improves out-of-sample MSE. That's exactly what matters in trading: prediction quality on tomorrow's data, not in-sample $R^2$ .

Regularization adds a penalty term to the OLS loss to discourage large coefficients, trading bias for variance.

Ridge regression uses an $L_2$ penalty:

\min_\beta \|y - X\beta\|^2 + \lambda \|\beta\|_2^2

It shrinks all coefficients toward zero and has a closed-form solution. Especially useful when predictors are correlated.

Lasso uses an $L_1$ penalty:

\min_\beta \|y - X\beta\|^2 + \lambda \|\beta\|_1

Lasso drives some coefficients to exactly zero, doubling as variable selection. The optimization is convex but lacks a closed form (use coordinate descent).

Elastic Net combines $L_1$ and $L_2$ penalties — better than Lasso when groups of correlated predictors should all enter or none.