Random Forests: Wisdom of the Crowd
A single decision tree is easy to understand but fragile, it overfits readily and its predictions can swing a lot with small data changes. A random forest fixes this with a simple, powerful idea: train many different trees and let them vote.
Each tree in the forest is grown on a slightly different random sample of the data and considers a random subset of features at each split. So every tree is a little different and makes different mistakes. When you average their votes, the individual errors tend to cancel out and the random quirks wash away. This is the wisdom of the crowd: a group of diverse, imperfect guessers often beats any single expert.
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators=50, random_state=0)
forest.fit(X_train, y_train)n_estimators is how many trees to grow (50 here). More trees usually help a little, up to a point, at the cost of speed. As always, random_state=0 keeps the result reproducible since the forest's sampling is random.
The headline benefit: a forest typically generalizes better than a single tree, meaning a higher score on the held-out test set. It is one of the most reliable off-the-shelf models, and a great default to reach for.
You will train one plain decision tree and one random forest on the same split, score both on the test set, and confirm the forest does at least as well.
A dataset and train/test split are prepared. Train tree = DecisionTreeClassifier(max_depth=3, random_state=0) and forest = RandomForestClassifier(n_estimators=50, random_state=0) on the training data. Score both on the test set with accuracy_score, storing the results in tree_acc and forest_acc.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.