An SVM finds the linear decision boundary that maximizes the margin between two classes. For linearly separable data:
The dual formulation depends only on inner products — the kernel trick replaces them with a kernel function to fit non-linear boundaries (RBF, polynomial, sigmoid).
For non-separable data, slack variables allow some misclassification at a cost (soft-margin SVM). The trade-off is governed by a hyperparameter .
SVMs were dominant in the 2000s before being largely supplanted by tree ensembles and deep learning on most problems. They're still used in some structured settings — text classification, anomaly detection, and high-dimensional small-data problems where kernels can capture useful geometry.