Classification Metrics — Section 5: Classification

Accuracy alone is a poor metric for most real classification problems — especially with imbalanced classes. The right metric depends on the costs of different errors.

Confusion matrix

For binary classification, four cells: True Positives, False Positives, True Negatives, False Negatives. Every metric is some ratio of these four numbers.

Precision and Recall

Precision = TP / (TP + FP). Of the items we said were positive, what fraction actually are? Important when false positives are costly (spam detection — don't flag legit emails).
Recall (sensitivity, TPR) = TP / (TP + FN). Of all the truly positive items, what fraction did we catch? Important when false negatives are costly (medical screening — don't miss the disease).

F1 score

Harmonic mean of precision and recall: $F_1 = 2 \cdot P \cdot R / (P + R)$ . Useful when you care about both equally. $F_\beta$ generalizes: $F_2$ weighs recall more, $F_{0.5}$ weighs precision more.

ROC and AUC

The Receiver Operating Characteristic plots TPR vs FPR as the decision threshold varies. AUC is the area under this curve — equals the probability that a random positive scores higher than a random negative. AUC = 0.5 is random; 1.0 is perfect. Threshold-independent, which is both its strength and weakness.

Precision-Recall curve

PR curves are usually MORE informative than ROC for imbalanced data. ROC can look great with terrible PR; AUC-PR is the area under the PR curve.

Log loss

$-\sum [y_i \log p_i + (1 - y_i) \log (1 - p_i)]$ . Cares about CALIBRATION — penalizes confidently wrong predictions heavily. Used for probabilistic forecasting (weather, sports betting, advertising).

When accuracy is fine

Balanced classes, symmetric error costs, mostly-correct predictions. For most real problems, at least one of those fails and you need precision/recall/AUC instead.