Evaluating RAG & Retrieval

NDCG@k: Graded Relevance

Precision, recall, and MRR all treat relevance as a yes/no flag. But real relevance has shades: a document can be perfect, somewhat useful, or off-topic. NDCG (Normalized Discounted Cumulative Gain) is the metric for graded relevance, and it is the one large search and recommendation teams actually report.

It has three ideas stacked up. First, gain: each result carries a relevance score (say 3 = perfect, 1 = marginal, 0 = irrelevant). Second, a discount by position, because a great result buried at rank 10 helps less than one at rank 1. Discounted Cumulative Gain sums each result's gain divided by log2(position + 1):

DCG = sum( rel_i / log2(i + 1) )   for i = 1, 2, 3, ...

At position 1 the divisor is log2(2) = 1 (no discount); at position 2 it is log2(3); and so on, so later positions are worth steadily less. Third, normalization: DCG by itself is unbounded and hard to compare across queries, so divide by the ideal DCG (IDCG), the DCG you would get if those same relevance scores were sorted from best to worst.

NDCG@k = DCG(top k) / DCG(ideal ordering of the same scores, top k)

That puts every query on a 0.0 to 1.0 scale where 1.0 means "already in the best possible order." A reversed ranking scores strictly below 1.0; an all-zero (nothing relevant) ranking is defined as 0.0.

What to build.

dcg(rels) -> the discounted cumulative gain of a list of relevance scores, using math.log2(i + 1) with i starting at 1.
ndcg_at_k(ranked_rels, k) -> dcg of the top k divided by the dcg of the same scores sorted descending (the ideal), top k. If the ideal DCG is 0, return 0.0 instead of dividing by zero.

Hand-check one: dcg([3, 0, 2]) is 3/log2(2) + 0/log2(3) + 2/log2(4) = 3 + 0 + 1 = 4.0. Press Run to compare a perfect ordering against a reversed one.

Your turn

Write dcg(rels) returning the discounted cumulative gain sum(rel_i / log2(i + 1)) with positions i counted from 1. Then write ndcg_at_k(ranked_rels, k) returning dcg of the top k divided by the ideal DCG (the same relevance scores sorted descending, top k); return 0.0 when the ideal DCG is 0 so an all-zero ranking does not divide by zero.

Spotted a problem in this lesson? Report it

Code · runs in your browser

Output

Back Next lesson

NDCG@k: Graded Relevance

This lesson is locked

Best on a laptop