Feedforward Networks — Section 20: Deep Learning

A feedforward network is a stack of affine transformations interspersed with non-linear activations:

h^{(l)} = \phi(W^{(l)} h^{(l-1)} + b^{(l)})

The non-linearity $\phi$ — usually ReLU these days — is what makes the network more than a single big linear function. The universal approximation theorem says a single hidden layer with enough units can approximate any continuous function arbitrarily well; in practice, deep networks are easier to train than wide ones for the same capacity.

Width and depth, activation choice, and parameter initialization all matter. Modern feedforward networks often have residual connections (skip layers) and layer normalization to stabilize training.

In quant work, feedforward networks fit non-linear functions on tabular features — return forecasts, default probability, slippage models. They're rarely the best choice over tree ensembles for tabular data, but they're competitive when feature interactions are deep.

A feedforward network is a stack of affine transformations interspersed with non-linear activations:

h^{(l)} = \phi(W^{(l)} h^{(l-1)} + b^{(l)})

Width and depth, activation choice, and parameter initialization all matter. Modern feedforward networks often have residual connections (skip layers) and layer normalization to stabilize training.