Syllabus Lesson 171 of 239 · Building Agents (Tool-Use & ReAct)
Building Agents (Tool-Use & ReAct)

Agent Memory & State Accumulation

The ReAct loop you built has amnesia: each step calls a tool and forgets. A real agent has to carry what it learned forward - the facts it looked up, the results tools returned - so a later step can use them. That running notebook is the agent's memory (people also call it the scratchpad or working context). Your code owns it; the model never does.

But context is not free. Everything in memory gets stuffed back into the prompt on the next step, and the model has a fixed context window measured in tokens. Let memory grow forever and you eventually overflow the window (and pay for every token besides). So memory needs two halves: accumulate new facts as you go, and trim back to a budget when it gets too big.

The shape of it

State is one dict: a permanent system note (the agent's standing instructions, which you never throw away) and a list of facts it has gathered:

state = {"system": "You are a research agent.", "facts": []}

Every observation becomes one fact line. After a tool runs you append a readable record of what happened:

state["facts"].append(f"{tool}({args}) -> {observation}")

Trimming to a token budget

Real tokenizers are learned and subword-based; we cannot run one here, so we estimate a token as one whitespace word (len(text.split())) - same trick as the LLM module. The trim rule mirrors what production agents actually do: always keep the system note, then keep as many of the most-recent facts as fit in what is left of the budget. Recency wins because the latest results are usually the ones the next step needs.

budget = max_tokens - est_tokens(state["system"])
# walk facts newest-first, keep while they fit, then stop

Walk the facts from newest to oldest, adding each one's token cost until the next fact would blow the budget, then stop. Reverse back so the kept facts stay in their original order. If the budget is tiny you may keep zero facts, but the system note always survives.

In a live agent the model would read render(state) at the top of every step and the same accumulate-then-trim cycle would run after every tool call, so the window never overflows. The on-device tutor can drive this loop, but grading only ever inspects the deterministic state. Press Run to accumulate a short transcript and watch the oldest facts fall away under a tight budget.

Your turn

Build the agent's memory. Write new_state(system_note) -> {"system": note, "facts": []}; record(state, tool, args, observation) which appends the fact line "<tool>(<args>) -> <observation>" and returns the state; est_tokens(text) -> whitespace token count; and trim(state, max_tokens) which keeps the system note plus the most-recent facts whose tokens (the system note counts against the budget) fit in max_tokens. Then write accumulate(system_note, transcript, max_tokens): start a fresh state, record every (tool, args, observation) step in order, trim once at the end, and return the trimmed state.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output