Syllabus Lesson 94 of 239 · Your First Machine Learning Models
Your First Machine Learning Models

Random Forests: Wisdom of the Crowd

A single decision tree is easy to understand but fragile, it overfits readily and its predictions can swing a lot with small data changes. A random forest fixes this with a simple, powerful idea: train many different trees and let them vote.

Each tree in the forest is grown on a slightly different random sample of the data and considers a random subset of features at each split. So every tree is a little different and makes different mistakes. When you average their votes, the individual errors tend to cancel out and the random quirks wash away. This is the wisdom of the crowd: a group of diverse, imperfect guessers often beats any single expert.

from sklearn.ensemble import RandomForestClassifier

forest = RandomForestClassifier(n_estimators=50, random_state=0)
forest.fit(X_train, y_train)

n_estimators is how many trees to grow (50 here). More trees usually help a little, up to a point, at the cost of speed. As always, random_state=0 keeps the result reproducible since the forest's sampling is random.

The headline benefit: a forest typically generalizes better than a single tree, meaning a higher score on the held-out test set. It is one of the most reliable off-the-shelf models, and a great default to reach for.

You will train one plain decision tree and one random forest on the same split, score both on the test set, and confirm the forest does at least as well.

Your turn

A dataset and train/test split are prepared. Train tree = DecisionTreeClassifier(max_depth=3, random_state=0) and forest = RandomForestClassifier(n_estimators=50, random_state=0) on the training data. Score both on the test set with accuracy_score, storing the results in tree_acc and forest_acc.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output