Bias-Variance Tradeoff — Section 6: Model Evaluation

Total error of a model on new data decomposes into three components:

E[(y - \hat{f}(x))^2] = \text{Bias}^2(\hat{f}) + \text{Var}(\hat{f}) + \sigma^2

Bias is systematic error from oversimplifying. Variance is sensitivity to training set noise. $\sigma^2$ is irreducible noise. Reducing one usually increases the other.

High bias (underfitting)

Symptoms: training error is high, validation error is high (similar to training). The model can't capture the true relationship. Fix: use a more flexible model, add features, reduce regularization.

High variance (overfitting)

Symptoms: training error is low, validation error is much higher. The model memorized noise. Fix: simpler model, more data, more regularization, ensemble averaging.

The tradeoff in action

Polynomial degree 1 (linear): high bias if the truth is curved
Polynomial degree 20: low bias but high variance — overfits even moderate noise
Optimal: somewhere between, found via cross-validation

Bagging reduces variance

Averaging many overfit models (each fit on bootstrap samples) reduces variance without raising bias. Random forests exploit this directly.

Boosting reduces bias

Sequentially fitting trees to residuals adds capacity, reducing bias. Combined with shallow trees (high bias individually), boosting moves the bias-variance balance favorably.

More data helps variance, not bias

Doubling the training set roughly halves model variance but doesn't change bias. If your model is biased, getting more data won't help — you need a better model. If it's variable, more data will.