Syllabus Lesson 158 of 239 · Evaluating RAG & Retrieval
Evaluating RAG & Retrieval

Mean Reciprocal Rank

Precision and recall treat the top k as a bag: a relevant doc at position 1 counts the same as one at position 5. But rank matters. A user reads from the top, and a RAG prompt usually leans hardest on the first chunk. Mean Reciprocal Rank (MRR) rewards getting a relevant result high.

For a single query, the reciprocal rank is 1 / (rank of the first relevant doc), counting ranks from 1. If the first relevant doc sits at position 1 you score 1.0; at position 2 you score 0.5; at position 3, 1/3. If no relevant doc appears at all, the score is 0.0. Only the first hit matters, which is exactly the right model for "did the answer show up where the user (or the model) will look?"

retrieved = ["a", "b", "c"], relevant = {"b"}
# first relevant doc is "b" at rank 2 -> reciprocal rank = 1/2 = 0.5

MRR is just the mean of those reciprocal ranks across a whole set of queries. Two queries scoring 1.0 and 0.5 give an MRR of 0.75. It is the standard headline number for question-answering retrieval, where each query typically has one right answer you want surfaced first.

What to build.

  • reciprocal_rank(retrieved, relevant) -> walk the ranked list, and the moment you hit a doc in relevant return 1.0 / rank (ranks start at 1). If you never hit one, return 0.0.
  • mrr(queries) -> queries is a list of (retrieved, relevant) pairs; average each pair's reciprocal rank. An empty list returns 0.0, not a crash.

Use enumerate(retrieved, start=1) so the index already is the rank. Press Run to score two queries and see the mean.

Your turn

Write reciprocal_rank(retrieved, relevant) returning 1.0 / (rank of the first relevant doc) with ranks counted from 1, or 0.0 if no relevant doc is retrieved. Then write mrr(queries) where queries is a list of (retrieved, relevant) pairs, returning the mean of their reciprocal ranks (and 0.0 for an empty list).

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output