Intent Classification (TF-IDF + LogisticRegression)
Not every "AI feature" needs a large language model. A huge fraction of production LLM apps put a tiny, boring classifier in front of the model: it reads the incoming message, decides what kind of request it is (billing? a bug report? an account change?), and routes accordingly. That decision is fast, free, runs offline, and is deterministic -> the same input always gives the same answer. You only spend an expensive LLM call once you actually need one.
The workhorse for short text is TfidfVectorizer + LogisticRegression. TF-IDF turns each message into a sparse vector of weighted word counts; logistic regression learns a linear boundary between the intent classes. On a few dozen labelled examples it trains in milliseconds and classifies in microseconds.
You will wire the two into one object with a Pipeline so the vectorizer and the classifier are fit and applied together as a unit:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
pipe = Pipeline([
("tfidf", TfidfVectorizer()),
("clf", LogisticRegression(random_state=0, max_iter=1000)),
])
pipe.fit(texts, labels)
pipe.predict(["can I get a refund"]) # -> array(['billing'])Why random_state=0? Logistic regression's solver has a pseudo-random component. Pinning the seed makes training reproducible, which is the whole point of the deterministic layer: your routing must not flip between runs. The course follows this rule everywhere.
This is exactly the contrast with the LLM. A model call is non-deterministic, costs money, and adds latency. Here is the kind of fallback you might wire downstream once you know the intent (shown for context, never graded):
# Pseudocode: only AFTER the cheap classifier is unsure do you reach for the model.
intent = pipe.predict([msg])[0]
if intent == "general":
reply = await window._floatiTutor.complete(msg) # the on-device LLMBuild two functions. train_classifier(texts, labels) returns a fitted Pipeline (TF-IDF + LogisticRegression with random_state=0). predict(model, new_texts) takes that model and a list of strings and returns a list of predicted labels. Press Run to train on a small support-ticket set and watch it route held-out messages to the right team.
Write train_classifier(texts, labels) returning a fitted sklearn Pipeline of TfidfVectorizer() then LogisticRegression(random_state=0, ...), and predict(model, new_texts) returning a list of predicted labels for a list of input strings. The seed makes it reproducible; the classifier must send distinct held-out messages to their correct intents.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.