Multi-Turn Conversation Memory
A chatbot has no memory of its own -> the model sees only what you send it each call. To hold a conversation you replay the history: the system message plus the back-and-forth turns so far. But history grows without bound and the context window does not, so you need a memory that keeps the conversation and trims it to fit before each call.
The policy is the one production systems converge on. Always keep the system message -> it carries the instructions and persona that must never be lost. Then keep as many of the most-recent turns as fit the budget, dropping the oldest ordinary turns first, because recent context matters most for a coherent reply. Token budgets use the familiar estimate chars / 4.
convo = Conversation(system="You are a careful assistant.")
convo.add("user", "Tell me about Rome.")
convo.add("assistant", "Rome is the capital of Italy.")
context = convo.trimmed_context(max_tokens=64) # what you actually sendThis is exactly the object that sits in front of any chat model, including an on-device WebLLM engine: you accumulate turns, then pass trimmed_context(...) as the messages for the next call. No model runs here -> the trimming logic is deterministic and graded directly.
Build a Conversation class. The constructor takes an optional system message (pinned, never dropped). add(role, content) appends a turn. trimmed_context(max_tokens) returns the kept turns in chronological order: every system message, plus the most-recent ordinary turns whose estimated tokens (summed with the chars / 4 rule) fit within max_tokens, dropping oldest first. A short history under a generous budget is kept whole.
Build a Conversation class holding turns in order: add(role, content) appends a turn, and trimmed_context(max_tokens) keeps the system message(s) plus the most-recent turns that fit a chars/4 token budget (dropping oldest first), returned in chronological order. A short history is kept whole; a tight budget drops the oldest but keeps the system message and the newest turn; the order stays chronological; the token estimate is correct.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.