Backpropagation computes gradients of a loss with respect to all network parameters by repeatedly applying the chain rule, starting from the output and moving backward through the layers.
For a network , the gradient with respect to layer uses gradients computed in layer :
The "backward pass" reuses intermediate computations from the forward pass — without that reuse, gradients would cost per parameter. With it, the cost is comparable to a single forward pass.
Modern frameworks (PyTorch, JAX, TensorFlow) handle backprop automatically via autograd. The only thing you really need to specify is the forward pass; everything else is computed via reverse-mode differentiation through the graph.
Practical issues: vanishing gradients in deep nets (mostly fixed by ReLU and residual connections), exploding gradients in RNNs (use gradient clipping), and numerical instability with poorly scaled inputs.