Loss Functions — Section 20: Deep Learning

The loss function defines what the network is being trained to do. Common choices:

Mean Squared Error (MSE): $(y - \hat{y})^2$ . Standard for regression. Sensitive to outliers.
Mean Absolute Error (MAE) / Huber: $|y - \hat{y}|$ or a smooth blend. More robust to outliers, harder to optimize than MSE.
Cross-entropy: for classification, $-\sum_c y_c \log \hat{p}_c$ . Softmax + cross-entropy is the standard for multi-class.
Binary cross-entropy: $-y \log \hat{p} - (1-y)\log(1-\hat{p})$ .
Custom losses: Sharpe-aware loss, drawdown-penalized loss, asymmetric loss for asymmetric costs of over- vs. under-prediction.

Pick the loss to match the cost structure of mistakes. In trading, the cost of a wrong-direction prediction usually dwarfs the cost of size, so directionally-aware losses (rank loss, sign-prediction) often outperform pure MSE on returns.

The loss function defines what the network is being trained to do. Common choices:

Mean Squared Error (MSE): $(y - \hat{y})^2$ . Standard for regression. Sensitive to outliers.
Mean Absolute Error (MAE) / Huber: $|y - \hat{y}|$ or a smooth blend. More robust to outliers, harder to optimize than MSE.
Cross-entropy: for classification, $-\sum_c y_c \log \hat{p}_c$ . Softmax + cross-entropy is the standard for multi-class.
Binary cross-entropy: $-y \log \hat{p} - (1-y)\log(1-\hat{p})$ .
Custom losses: Sharpe-aware loss, drawdown-penalized loss, asymmetric loss for asymmetric costs of over- vs. under-prediction.