The ReAct Loop Over Mock Tools
Strip away the marketing and an "AI agent" is a loop. The model does not magically take actions; your code does. The pattern is ReAct: Reason about what to do, take an Action by calling a tool, Observe the result, then loop until the task is done. The model only ever produces text; the loop around it turns that text into real tool calls and feeds the results back.
The reasoning step is the LLM, and it is non-deterministic, so we cannot unit-test it. But everything around it is deterministic and is exactly what breaks in production: the dispatch, the observation handling, the stop condition. So we test the loop with the model's job replaced by a scripted plan - a fixed list of (tool, args) actions. That is how you actually unit-test an agent: pin the plan, assert the trace.
You will build run_agent(plan, tools, max_steps). plan is a list of steps like ("lookup", {"country": "France"}). tools is a dict mapping a tool name to a Python function that takes the args dict and returns an observation. For each step you call the matching tool and append a (tool, args, observation) triple to a trace - the audit log of what the agent did.
def run_agent(plan, tools, max_steps=10):
trace = []
for action, args in plan:
if len(trace) >= max_steps:
break
if action == "finish":
break
observation = tools[action](args)
trace.append((action, args, observation))
return traceThe rules:
- For each step, call
tools[action](args)and record(action, args, observation)in order. - A
"finish"action ends the loop immediately and is not recorded in the trace (it is a signal, not a tool call). - Stop once the trace already holds
max_stepsentries, even if the plan has more steps - that guard stops a runaway agent. - If a step has no args, treat it as an empty dict.
In a real app the LLM would emit each next action as JSON and the same loop would run it, so you could swap the scripted plan for live model output without touching the loop. The on-device tutor can drive this very loop live, but grading only ever looks at the deterministic trace. Press Run to watch a two-step plan execute: look up a capital, then do some arithmetic, then finish.
Write run_agent(plan, tools, max_steps=10) that executes a scripted plan of (tool, args) steps against a mock tool registry. For each step, call tools[action](args) and append (action, args, observation) to a trace list. A "finish" action stops the loop and is not recorded; stop once the trace holds max_steps entries. Return the trace.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.