Wire the Agent's ReAct Loop
Now the heart of the project: the ReAct loop (Reason -> Act -> Observe), the control loop behind essentially every agent framework. The model does not answer in one shot. It thinks a step, picks a tool, runs it, reads the result, and only then decides what to do next. Your job is to write that loop in plain Python over mock tools, so you understand the machinery instead of treating LangChain as a black box.
In a real deployment the "reason" step is an LLM call. Here we replace the model with a small deterministic policy so the whole thing is testable offline (no network, no API). The policy looks at the task and at the observations gathered so far, and returns the next action. That separation is the real lesson: the loop is generic; only the policy is smart.
A task is a tiny spec the policy reads:
task = {"need": ["team size", "office floors"], "op": "*"}
# "look up these facts, then combine them with this operator"You will build run_agent(task, tools, max_steps=8). It returns a trace: an ordered list of (thought, tool, args, observation) tuples, one per step. That trace is the agent's whole reasoning history, and it is exactly what you log, replay, and (next lessons) evaluate.
The reasoning the loop encodes, each iteration:
- If any needed fact is still unknown, the next action is
searchfor the first missing one. Record the observation and remember the fact. - Once all facts are gathered and the task has an
op, the next action iscalcto combine the two facts. Remember the result. - When there is nothing left to do, emit a final
finishaction whose observation is the answer, and stop.
def run_agent(task, tools, max_steps=8):
trace, facts = [], {}
for _ in range(max_steps):
missing = [k for k in task["need"] if k not in facts]
if missing:
key = missing[0]
obs = tools["search"]({"query": key}) # ACT + OBSERVE
facts[key] = obs
trace.append(("need " + key, "search", {"query": key}, obs))
continue
... # compute if there is an op, else finishThe tools dict is your registry from lesson 1 in spirit: search(args) resolves a fact, calc(args) applies op to a and b. The final answer is the calc result when there is an op, or the single looked-up fact when there is not. The grader checks the step-by-step (tool, args, observation) sequence and the final answer, so the loop has to actually route correctly, not guess. It also runs a second, different task, so a hardcoded trace cannot pass. Press Run to watch a two-fact task reason its way to an answer.
Write make_tools() returning a dict with search(args) (resolve args["query"] against a small fact table; "not found" if absent) and calc(args) (apply args["op"] over args["a"], args["b"] for + - * /). Then write run_agent(task, tools, max_steps=8) implementing the reason-act-observe loop: search each still-missing key in task["need"] (first missing first), then calc if task["op"] is set, then a final finish step. Return the ordered list of (thought, tool, args, observation) tuples; the final answer is the calc result, or the single fact when there is no op.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.