Total error of a model on new data decomposes into three components:
Bias is systematic error from oversimplifying. Variance is sensitivity to training set noise. is irreducible noise. Reducing one usually increases the other.
High bias (underfitting)
Symptoms: training error is high, validation error is high (similar to training). The model can't capture the true relationship. Fix: use a more flexible model, add features, reduce regularization.
High variance (overfitting)
Symptoms: training error is low, validation error is much higher. The model memorized noise. Fix: simpler model, more data, more regularization, ensemble averaging.
The tradeoff in action
- Polynomial degree 1 (linear): high bias if the truth is curved
- Polynomial degree 20: low bias but high variance — overfits even moderate noise
- Optimal: somewhere between, found via cross-validation
Bagging reduces variance
Averaging many overfit models (each fit on bootstrap samples) reduces variance without raising bias. Random forests exploit this directly.
Boosting reduces bias
Sequentially fitting trees to residuals adds capacity, reducing bias. Combined with shallow trees (high bias individually), boosting moves the bias-variance balance favorably.
More data helps variance, not bias
Doubling the training set roughly halves model variance but doesn't change bias. If your model is biased, getting more data won't help — you need a better model. If it's variable, more data will.