Syllabus Lesson 111 of 239 · Neural-Net Intuition, LLMs & AI Capstone
Neural-Net Intuition, LLMs & AI Capstone

AI Capstone: your code drives an on-device LLM

This is the capstone, and you are the architect. The whole track has been building to one idea: your Python orchestrates the model. The LLM is just one component you call -> your code decides what data it sees, how the request is shaped, and what happens when no model is loaded. You will build that orchestration from a near-blank file.

Every real AI feature is the same loop. Your code does the reliable work, the model does the wording:

  • Compute the facts in pandas. Code is exact at arithmetic; a model is not. So the row count, total, average, and top category come from a DataFrame, never from the LLM. This is the only way to avoid hallucinated numbers.
  • Shape the request yourself. A reusable prompt template keeps the system instruction (who the model should be) separate from the user message (what it should do), then assembles them into one string that ends on a turn marker so the model knows where to start writing.
  • Always have an offline path. The on-device model needs WebGPU and is non-deterministic, so you also write a rule-based generator that turns the same facts into a sentence. The feature works even with no model at all.

What you will build

Two functions, from scratch. build_prompt(system, user) returns one clean string that contains both parts and ends on Assistant:. insight(df) computes the stats on an embedded spending table and returns a plain-English line that names your biggest category and the total. A small spend_stats(df) helper does the pandas work so both the report and the insight read from the same numbers.

The AI step (try it in the app)

Floati runs its model client-side with WebLLM on WebGPU -> no account, no key, no cloud, your prompts never leave the machine. The app wires a single shared engine onto window._floatiTutor. Once you have built build_prompt, this is the real call (it lives outside the grader because it needs a GPU and is non-deterministic):

// Floati already created window._floatiTutor (a WebLLM engine).
const prompt = build_prompt("You explain budgets in one warm sentence.", report);
const reply = await window._floatiTutor.chat.completions.create({
  messages: [{ role: "user", content: prompt }],
  temperature: 0.5, max_tokens: 120,
});
console.log(reply.choices[0].message.content);

Notice the model only ever sees the report string your Python produced -> the numbers are already correct before the model speaks. Press Run to see your offline pipeline print its stats and its rule-based insight; that printed report is exactly what you would hand to the model above. The grader checks your prompt shape and your deterministic insight, never any model output.

Your turn

Build the orchestration. The spending DataFrame df is embedded for you (10 rows, columns category and amount). Write three things: (1) spend_stats(df) -> a dict with keys rows, total, average, top_category, top_amount (totals/averages rounded to 2 decimals, top category by summed amount); (2) build_prompt(system, user) -> one string that contains both system and user and ends on Assistant: so the model continues from there; (3) insight(df) -> a non-empty plain-English string (the rule-based fallback) that names the top category and shows a TOTAL: <amount> with 2-decimal money. Then set report = insight(df) and print(report).

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output