Your choice of loss isn't a technical detail — it defines what "correct" means for your model. Two models trained on the same data with different losses can have completely different behavior.
Regression losses
- Squared error : penalizes large errors disproportionately. Sensitive to outliers. The conditional mean is the optimal predictor.
- Absolute error : outlier-robust. The conditional median is optimal.
- Huber for , linear beyond: smooth combination. Standard for robust regression.
- Quantile where : optimal predictor is the -quantile. Use for asymmetric costs (overestimating delivery time vs underestimating).
Classification losses
- Cross-entropy : the standard choice. Forces the model to output calibrated probabilities. Steep gradient when confident and wrong — fast learning signal.
- Hinge : SVM loss. Doesn't reward beyond-margin correctness; zero gradient there. Doesn't give probabilities.
- Focal : down-weights easy examples. Used for class imbalance, especially in detection.
Picking a loss
Start from the business cost. If false positives and false negatives cost the same, cross-entropy is fine. If they don't, either use a weighted loss or tune the decision threshold after training (cheaper and more flexible).