Your First Machine Learning Models

Beyond Accuracy: Precision, Recall & F1

You saw the trap in the last lesson: if 95% of emails are not spam, a model that always guesses "not spam" scores 0.95 accuracy while catching zero spam. On any imbalanced problem, accuracy lies. The fix is to look at which kind of mistakes the model makes, with the confusion matrix.

For a yes/no classifier (label 1 = the positive class, e.g. "spam"), every prediction falls into one of four boxes:

TP  true positive   predicted 1, actually 1   (caught real spam)
FP  false positive  predicted 1, actually 0   (flagged a good email)
FN  false negative  predicted 0, actually 1   (missed real spam)
TN  true negative   predicted 0, actually 0   (left a good email alone)

Three numbers fall out of those counts, and they are what real teams actually report:

Precision = TP / (TP + FP). Of everything you flagged, how much was right? Low precision = crying wolf.
Recall = TP / (TP + FN). Of everything you should have caught, how much did you? Low recall = missing the real cases.
F1 = 2 * P * R / (P + R), the harmonic mean. One number that is only high when BOTH precision and recall are high, so the lazy all-negative model (recall 0) scores F1 = 0 and is exposed.

Precision and recall trade off: flag everything and recall hits 1.0 but precision tanks; flag nothing and precision is undefined but you catch nothing. F1 keeps you honest. When a denominator is zero (you predicted no positives, or there are none), report that metric as 0.0 rather than dividing by zero.

Build two functions. confusion(y_true, y_pred) returns the tuple (tp, fp, fn, tn) over two equal-length lists of 0/1 labels. prf(y_true, y_pred) returns {"precision", "recall", "f1"} computed from those counts (0.0 on any zero denominator). The hidden tests cross-check your numbers against scikit-learn, so the definitions must be exact. Press Run to score an imbalanced classifier.

Your turn

Write confusion(y_true, y_pred) returning (tp, fp, fn, tn) for two equal-length lists of 0/1 labels (1 is the positive class), and prf(y_true, y_pred) returning {"precision", "recall", "f1"} where precision = TP/(TP+FP), recall = TP/(TP+FN), and f1 = 2*P*R/(P+R). Return 0.0 for any metric whose denominator is zero (never divide by zero). Your results are cross-checked against scikit-learn.

Spotted a problem in this lesson? Report it

Code · runs in your browser

Output

Back Next lesson

Beyond Accuracy: Precision, Recall & F1

This lesson is locked

Best on a laptop