Syllabus Lesson 86 of 239 · Your First Machine Learning Models
Your First Machine Learning Models

Your First Model: a Decision Tree

Time to actually train something. A decision tree classifier learns a flowchart of yes/no questions about the features, and each leaf of the flowchart is a predicted label. It is a great first model because it is easy to picture: it just keeps asking questions like "is feature 1 greater than 5?" until it can decide.

The code follows the fit/predict ritual from lesson one:

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=0)
clf.fit(X, y)               # learn the questions
clf.predict(X_new)          # classify new rows

We pass random_state=0 for reproducibility. As it builds each split scikit-learn randomly permutes the features it considers, so two unseeded runs can pick different (equally good) splits and grow slightly different trees. Fixing the seed pins that choice so you get the same tree every time.

You will train on a tiny, clean dataset. Each row has two features. The label is 0 when both features are small and 1 when both are large, with a clear gap in between:

X = [[1,1],[2,1],[1,2],[2,2],   # all label 0 (low/low)
     [8,8],[9,8],[8,9],[9,9]]   # all label 1 (high/high)

Because the two groups are perfectly separated, the tree can learn a rule that gets every training row right. That will not always happen with messy real data, and the next lessons are all about what to do when it does not. But it makes a clean first run.

One note on the wider AI world: heavyweight tools like TensorFlow and PyTorch cannot run in this in-browser Python, so this course stays with scikit-learn, which is plenty to learn the real ideas.

Your turn

Features X (8 rows, 2 columns) and labels y are given. Create a DecisionTreeClassifier(random_state=0) named clf, fit it on X and y, then predict on X and store the result in preds. Also predict the single new point [[9, 9]] and store just that one label (an int) in new_pred.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output