Syllabus Lesson 223 of 239 · Project: Production AI Gateway
Project: Production AI Gateway

Guardrail Pipeline + Trace

This is the last piece, and it is the one that lets you put the gateway in front of real users: safety and observability. Before any text reaches the model you run it through a guard pipeline that blocks prompt-injection attempts and scrubs personal data. And after every call you record a trace event, then roll those events up into the summary an on-call engineer actually looks at. Together these two ideas, "guardrail" and "cost-account," complete the resume line you started this module with.

The guard. Write guard(request) that inspects the request text and returns a verdict dict like {"allowed": bool, "reason": ..., "text": ..., "redacted": [...]}. Two checks:

  • Injection block: if the text matches a known attack pattern (think "ignore all previous instructions," "reveal the system prompt," "you are now..."), refuse it outright with allowed=False. A small list of case-insensitive regexes is enough here.
  • PII redaction: otherwise, scrub emails and phone numbers out of the text, replacing them with [EMAIL] and [PHONE] placeholders, and return the cleaned text with allowed=True plus the list of kinds you found.
INJECTION = re.compile(r"ignore (?:all )?previous instructions|reveal .*system prompt|you are now", re.I)

def guard(request):
    text = request["text"]
    if INJECTION.search(text):
        return {"allowed": False, "reason": "injection", "text": text, "redacted": []}
    clean, kinds = redact(text)
    return {"allowed": True, "reason": "ok", "text": clean, "redacted": kinds}

The trace. Every served request leaves a breadcrumb: which handler ran, whether it was blocked, and what it cost. Write aggregate_trace(events) over a list of dicts with keys handler, blocked, and cost, returning a summary with the total number of calls, how many were blocks, the total_cost of the calls that were actually allowed, and a by_handler breakdown of calls and cost per handler. This is a perfect job for a pandas groupby:

df = pd.DataFrame(events)
allowed = df[~df["blocked"]]
grouped = allowed.groupby("handler")["cost"].agg(["count", "sum"])

Blocked requests cost nothing and must not inflate total_cost, so sum cost only over the allowed rows. Handle the empty-trace case so a fresh gateway reports clean zeros instead of crashing. Press Run to block an injection, redact a request full of contact details, let a benign one through, and print a rolled-up trace.

Your turn

Write guard(request) (request has a "text" key) returning a verdict dict. Block prompt-injection attempts (a small list of case-insensitive regexes) with allowed=False and reason="injection"; otherwise redact emails and phone numbers to [EMAIL]/[PHONE] placeholders and return allowed=True, the cleaned text, and the list of kinds found. Then write aggregate_trace(events) over dicts with handler, blocked, cost returning {"calls", "blocks", "total_cost", "by_handler"}, where total_cost sums only the allowed (non-blocked) calls and by_handler breaks calls/cost down per handler.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output