The loss function defines what the network is being trained to do. Common choices:
- Mean Squared Error (MSE): . Standard for regression. Sensitive to outliers.
- Mean Absolute Error (MAE) / Huber: or a smooth blend. More robust to outliers, harder to optimize than MSE.
- Cross-entropy: for classification, . Softmax + cross-entropy is the standard for multi-class.
- Binary cross-entropy: .
- Custom losses: Sharpe-aware loss, drawdown-penalized loss, asymmetric loss for asymmetric costs of over- vs. under-prediction.
Pick the loss to match the cost structure of mistakes. In trading, the cost of a wrong-direction prediction usually dwarfs the cost of size, so directionally-aware losses (rank loss, sign-prediction) often outperform pure MSE on returns.