Your First Model: a Decision Tree
Time to actually train something. A decision tree classifier learns a flowchart of yes/no questions about the features, and each leaf of the flowchart is a predicted label. It is a great first model because it is easy to picture: it just keeps asking questions like "is feature 1 greater than 5?" until it can decide.
The code follows the fit/predict ritual from lesson one:
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X, y) # learn the questions
clf.predict(X_new) # classify new rowsWe pass random_state=0 for reproducibility. As it builds each split scikit-learn randomly permutes the features it considers, so two unseeded runs can pick different (equally good) splits and grow slightly different trees. Fixing the seed pins that choice so you get the same tree every time.
You will train on a tiny, clean dataset. Each row has two features. The label is 0 when both features are small and 1 when both are large, with a clear gap in between:
X = [[1,1],[2,1],[1,2],[2,2], # all label 0 (low/low)
[8,8],[9,8],[8,9],[9,9]] # all label 1 (high/high)Because the two groups are perfectly separated, the tree can learn a rule that gets every training row right. That will not always happen with messy real data, and the next lessons are all about what to do when it does not. But it makes a clean first run.
One note on the wider AI world: heavyweight tools like TensorFlow and PyTorch cannot run in this in-browser Python, so this course stays with scikit-learn, which is plenty to learn the real ideas.
Features X (8 rows, 2 columns) and labels y are given. Create a DecisionTreeClassifier(random_state=0) named clf, fit it on X and y, then predict on X and store the result in preds. Also predict the single new point [[9, 9]] and store just that one label (an int) in new_pred.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.