Syllabus Lesson 226 of 239 · Project: Tool-Using Research Agent
Project: Tool-Using Research Agent

Wire the Agent's ReAct Loop

Now the heart of the project: the ReAct loop (Reason -> Act -> Observe), the control loop behind essentially every agent framework. The model does not answer in one shot. It thinks a step, picks a tool, runs it, reads the result, and only then decides what to do next. Your job is to write that loop in plain Python over mock tools, so you understand the machinery instead of treating LangChain as a black box.

In a real deployment the "reason" step is an LLM call. Here we replace the model with a small deterministic policy so the whole thing is testable offline (no network, no API). The policy looks at the task and at the observations gathered so far, and returns the next action. That separation is the real lesson: the loop is generic; only the policy is smart.

A task is a tiny spec the policy reads:

task = {"need": ["team size", "office floors"], "op": "*"}
# "look up these facts, then combine them with this operator"

You will build run_agent(task, tools, max_steps=8). It returns a trace: an ordered list of (thought, tool, args, observation) tuples, one per step. That trace is the agent's whole reasoning history, and it is exactly what you log, replay, and (next lessons) evaluate.

The reasoning the loop encodes, each iteration:

  • If any needed fact is still unknown, the next action is search for the first missing one. Record the observation and remember the fact.
  • Once all facts are gathered and the task has an op, the next action is calc to combine the two facts. Remember the result.
  • When there is nothing left to do, emit a final finish action whose observation is the answer, and stop.
def run_agent(task, tools, max_steps=8):
    trace, facts = [], {}
    for _ in range(max_steps):
        missing = [k for k in task["need"] if k not in facts]
        if missing:
            key = missing[0]
            obs = tools["search"]({"query": key})   # ACT + OBSERVE
            facts[key] = obs
            trace.append(("need " + key, "search", {"query": key}, obs))
            continue
        ...   # compute if there is an op, else finish

The tools dict is your registry from lesson 1 in spirit: search(args) resolves a fact, calc(args) applies op to a and b. The final answer is the calc result when there is an op, or the single looked-up fact when there is not. The grader checks the step-by-step (tool, args, observation) sequence and the final answer, so the loop has to actually route correctly, not guess. It also runs a second, different task, so a hardcoded trace cannot pass. Press Run to watch a two-fact task reason its way to an answer.

Your turn

Write make_tools() returning a dict with search(args) (resolve args["query"] against a small fact table; "not found" if absent) and calc(args) (apply args["op"] over args["a"], args["b"] for + - * /). Then write run_agent(task, tools, max_steps=8) implementing the reason-act-observe loop: search each still-missing key in task["need"] (first missing first), then calc if task["op"] is set, then a final finish step. Return the ordered list of (thought, tool, args, observation) tuples; the final answer is the calc result, or the single fact when there is no op.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output