Deep nets have so many parameters that regularization isn't optional.
Weight decay penalizes , equivalent to regularization. It's almost always worth using; modern optimizers like AdamW separate it from the gradient update for cleaner behavior.
Dropout randomly zeros out a fraction of activations during training, forcing the network not to rely on any single neuron. It's roughly equivalent to ensembling many subnetworks. Use lower dropout in convolutional layers, higher (e.g. ) in fully-connected ones.
Early stopping monitors validation loss and stops training when it stops improving — a free regularizer that also saves compute.
Data augmentation, batch normalization, and label smoothing are also common regularizers in production deep-learning pipelines.