Classification Metrics — Section 19: Machine Learning Fundamentals

Accuracy alone is misleading on imbalanced data — predicting "no fraud" always gets $99\%$ accuracy if fraud is $1\%$ of cases. Better metrics:

Precision: $\text{TP} / (\text{TP} + \text{FP})$ . "Of what I flagged, how much was real?"
Recall (sensitivity): $\text{TP} / (\text{TP} + \text{FN})$ . "Of the real positives, how many did I catch?"
F1: harmonic mean of precision and recall.
ROC curve: TPR vs FPR over all thresholds; AUC summarizes.
PR curve: precision vs recall; better than ROC when classes are heavily imbalanced.

Pick the metric to match cost. False alarms on a fraud system are cheap; missed fraud is expensive — recall matters. False alarms in cancer screening trigger painful follow-ups; precision matters too.

Accuracy alone is misleading on imbalanced data — predicting "no fraud" always gets $99\%$ accuracy if fraud is $1\%$ of cases. Better metrics:

Precision: $\text{TP} / (\text{TP} + \text{FP})$ . "Of what I flagged, how much was real?"
Recall (sensitivity): $\text{TP} / (\text{TP} + \text{FN})$ . "Of the real positives, how many did I catch?"
F1: harmonic mean of precision and recall.
ROC curve: TPR vs FPR over all thresholds; AUC summarizes.
PR curve: precision vs recall; better than ROC when classes are heavily imbalanced.