XGBoost is gradient boosting plus three ideas: a regularized objective, second-order optimization, and serious systems engineering. Understanding it tells you most of what's different about modern GBMs (LightGBM, CatBoost).
Regularized objective
is the number of leaves, is the vector of leaf scores. The penalty discourages adding leaves (similar to pruning); the shrinks leaf scores.
Second-order Taylor expansion
Approximate the loss around current predictions :
where is the gradient and the Hessian. This lets the optimal leaf value have a closed form:
and the optimal split-gain formula:
Splits with gain are pruned. This is principled regularization baked into the split criterion.
Systems tricks
- Column block storage: features sorted once, then reused across iterations. Splits become linear scans.
- Approximate split finding: weighted quantile sketch over candidate split points — required for huge datasets where exact enumeration is too expensive.
- Sparse-aware splits: missing values get a "default direction" learned automatically. No need to impute.
- Cache-aware prefetching, out-of-core training: industrial-scale capability that random-forest implementations historically lacked.
What LightGBM and CatBoost change
- LightGBM: histogram-based splits (bucket continuous features into ~256 bins) → much faster training. Leaf-wise tree growth → potentially deeper trees, sometimes overfits.
- CatBoost: native categorical encoding using ordered target statistics. Better default handling of categorical features without one-hot blowup.