A decision tree splits the feature space into rectangles, predicting a constant within each leaf. Greedy training picks splits to minimize impurity (Gini, entropy, or MSE for regression).
Single trees are easy to interpret but unstable — a slightly different training sample produces a different tree.
Random forests fit many trees on bootstrapped samples with random feature subsets at each split, then average. The diversity reduces variance dramatically without much bias cost.
Gradient boosting (GBM, XGBoost, LightGBM, CatBoost) fits trees sequentially, each correcting the residual errors of the previous ensemble. It typically beats random forests on tabular data and is the de facto standard for Kaggle-style problems.
Trees naturally handle missing values, monotone relationships, and feature interactions, with no need to scale features. They're commonly used in quant trading for non-linear signals on tabular features and in credit modeling for interpretable risk scores.