Cross-validation estimates how a model will perform on unseen data by splitting the available data into training and validation parts. The simplest version is -fold: divide data into groups, train on and validate on the held-out group, rotate, and average.
Why bother? In-sample error always underestimates out-of-sample error because the model has been tuned to the data it's evaluated on. Cross-validation removes this leak.
Beware of look-ahead bias and shuffling in time series. For temporal data, use forward-chaining cross-validation (train on past, validate on future) rather than random splits — random splits can leak future information into training.
In quant trading, cross-validation results are necessary but not sufficient: even rigorous CV can produce overconfident estimates due to dataset re-use across many model trials. Walk-forward backtests and out-of-sample lockboxes provide harder-to-fool reality checks.