Syllabus Lesson 177 of 239 · Productionizing LLMs: Cost, Caching & Guardrails
Productionizing LLMs: Cost, Caching & Guardrails

PII Redaction Guardrail

Before user text reaches a model, log, or analytics table, you should strip out personally identifiable information (PII). For a privacy-first product this is not optional polish, it is the gate that keeps emails, phone numbers, and card numbers out of places they must never land. The technique is pattern matching with regular expressions, replacing each match with a typed placeholder.

Three common patterns get you most of the way:

  • Email: something like name@host.tld -> replace with [EMAIL].
  • Phone: a 10-digit number, possibly grouped with spaces, dashes, dots, or a leading country code -> [PHONE].
  • Card: 16 digits in four groups of four, separated by optional spaces or dashes -> [CARD].

Mask the longest, most-specific pattern first (the 16-digit card) so a greedier phone pattern cannot chew into it. As you replace, record which types you found, so callers can audit what was scrubbed. Clean text comes back untouched with an empty list.

import re
EMAIL = re.compile(r"\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b")

You would run this on the way into any model. For instance, before handing text to an on-device engine you would safe, _ = redact(user_text) and send safe, never the raw string. The model call itself is illustrative; the redactor is the part you build and grade.

Build redact(text). Return a tuple (masked_text, types) where masked_text has every email, phone, and card replaced by its placeholder, and types is a sorted list of the distinct entity types found (from "card", "email", "phone"). Clean text returns unchanged with [].

Your turn

Write redact(text) returning (masked_text, types). Replace every email with [EMAIL], phone number with [PHONE], and 16-digit card with [CARD]; types is the sorted list of distinct types found among "card", "email", "phone". A string with an email and a phone returns both masked and ['email', 'phone']; clean text comes back unchanged with []; a different PII mix yields a different type set.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output