Accuracy alone is misleading on imbalanced data — predicting "no fraud" always gets accuracy if fraud is of cases. Better metrics:
- Precision: . "Of what I flagged, how much was real?"
- Recall (sensitivity): . "Of the real positives, how many did I catch?"
- F1: harmonic mean of precision and recall.
- ROC curve: TPR vs FPR over all thresholds; AUC summarizes.
- PR curve: precision vs recall; better than ROC when classes are heavily imbalanced.
Pick the metric to match cost. False alarms on a fraud system are cheap; missed fraud is expensive — recall matters. False alarms in cancer screening trigger painful follow-ups; precision matters too.