Syllabus Lesson 129 of 239 · Prompt Engineering for AI Engineers
Prompt Engineering for AI Engineers

Context & Token Budgeting

Every model has a finite context window, and you pay per token. When a conversation grows past the budget, you cannot send all of it, so you decide what to keep. The standard policy: keep the system prompt (it sets the rules) plus the most recent messages that still fit, and drop the oldest.

You will build a simple budgeter. First, a rough token estimator: real tokenizers are model-specific, but a solid offline approximation is about 4 characters per token. estimate_tokens(text) returns len(text) // 4 (and 0 for empty text, at least 1 for any non-empty text).

Then fit_context(system, messages, budget) decides what survives:

  • The system prompt is counted first and always kept.
  • Walk the messages from newest to oldest, keeping each one whose tokens still fit under the budget; stop when the next one would overflow.
  • Return {"kept": [...], "dropped": n, "tokens": total} with kept restored to chronological order.
# three 10-token messages, budget 20, no system
fit_context("", [m1, m2, m3], 20)
# {"kept": [m2, m3], "dropped": 1, "tokens": 20}   # newest two survive

The two subtleties the grader checks: a tighter budget keeps fewer (and the right) messages, and a non-empty system prompt eats into the budget so fewer messages fit. Reverse-iterate to pick newest-first, then reverse the kept list back so the conversation reads in order. Press Run to grade.

Your turn

Write estimate_tokens(text) returning len(text) // 4 (0 for empty, at least 1 for non-empty). Write fit_context(system, messages, budget) that counts the system first, keeps the newest messages that fit (iterate in reverse), and returns {"kept", "dropped", "tokens"} with kept back in chronological order.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output