What Is an LLM?
A large language model (LLM) like the ones behind chat assistants is, at heart, a giant next-token predictor. A few terms make the whole thing click.
- Tokens. Text is chopped into pieces called tokens, roughly words or word-fragments. The model reads and writes tokens, not raw characters. A rough rule: one token is about four characters of English.
- Prediction. Given the tokens so far, the model outputs a probability for every possible next token. It picks one, appends it, and repeats. That loop is how it writes whole paragraphs.
- Prompt. The text you feed in. Everything the model knows about your request lives in the prompt, so how you phrase it matters a lot (next lessons).
- Temperature. A knob on the randomness of that pick. Near
0the model almost always takes the most likely token (focused, repetitive). Higher values let it sample less likely tokens (more creative, less reliable). - Hallucination. Because it predicts plausible text rather than looking facts up, an LLM can state wrong things with total confidence. Always verify facts it gives you.
Real tokenizers are subword-based and learned from data. We will not build that here, but we can build the simplest possible tokenizer (split on whitespace) to feel what counting tokens is like. Token counts are how usage and limits are measured.
def tokenize(text):
return text.split() # whitespace tokens
tokenize("the cat sat") # ['the', 'cat', 'sat']Write a simple whitespace tokenizer. tokenize(text) returns the list of whitespace-separated tokens (use .split()). count_tokens(text) returns how many there are. Then set tokens = tokenize(sample) and n_tokens = count_tokens(sample).
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.