Kullback–Leibler (KL) divergence measures how different a distribution is from a reference :
KL is always non-negative and zero iff . It is not symmetric: in general, so it's not a true metric.
KL is the workhorse of probabilistic ML. Variational inference picks an approximate posterior to minimize . Cross-entropy loss in classification is essentially the KL between empirical and predicted labels (up to a constant). In risk, KL appears in entropy-based VaR estimates and model-validation tests.