Syllabus Lesson 135 of 239 · Structured Outputs & Function/Tool Calling
Structured Outputs & Function/Tool Calling

Constrained / Schema-Enforced Outputs

You have validated model JSON by hand in the last lessons. Modern practice goes one step earlier: stop the model from emitting bad JSON in the first place. This is constrained decoding (also called structured outputs). You hand the API a schema, and the decoder is only allowed to sample tokens that keep the output valid against it. The model literally cannot produce a missing field or a wrong-typed value.

With the OpenAI-style API you pass a response_format with a JSON Schema and strict: true (illustrative only -> we do not call any model here):

resp = client.chat.completions.create(
    model="...",
    messages=[...],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {"name": {"type": "string"}, "age": {"type": "integer"}},
                "required": ["name", "age"],
            },
        },
    },
)

On the Python side the de-facto standard for declaring and validating that shape is Pydantic. You write a model class and the library parses, coerces, and validates for you:

from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

record = Person.model_validate_json(text)   # raises if it does not fit

Pydantic is not available in this in-browser sandbox, so you will build the same idea by hand with the standard library. That is exactly what these libraries do under the hood, and it makes the rules impossible to hand-wave. You are writing validate_against_schema(obj, schema).

The schema is a dict. Each value is either a type name string ("int", "float", "str", "bool", "list", "dict") or a small spec dict like {"type": "int", "default": 0}. Your function returns a result dict {"ok": bool, "value": dict, "errors": list}:

  • If obj is not a dict, return ok=False with one root error and an empty value.
  • For each field: if it is missing and the spec has a "default", fill the default (the coerce/default pass). If it is missing with no default, record an error.
  • If present, check its type. On a mismatch, record an error naming the field.
  • Coerce a plain int up to a float when a float is wanted (that is lossless and safe), but do not coerce anything else.
  • If there were any errors, return ok=False with the collected errors and an empty value; otherwise return ok=True with the validated value.

The bool-is-int trap. In Python bool is a subclass of int, so isinstance(True, int) is True and JSON true decodes to Python True. A naive type check would let "age": true sail through an int field. Reject a bool explicitly wherever an int or a float is required:

if expected is int and isinstance(value, bool):
    errors.append(field + ": expected int, got bool")

Build it so two different well-formed payloads both pass, a missing field and a wrong type each land in errors, a default fills in, an int coerces to a float, and a sneaky bool is rejected where a number is required.

Your turn

Write validate_against_schema(obj, schema) returning {"ok", "value", "errors"}. schema maps each field to a type-name string ("int"/"float"/"str"/"bool"/"list"/"dict") or a spec dict {"type": ..., "default": ...}. Reject a non-dict obj, record an error for each missing (no-default) or wrong-typed field, fill a default when one is given, coerce a plain int up to float, and reject a bool where an int or float is required (the bool-is-int trap). Return ok=True with the validated value only when there are no errors.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output