Project: Classify Flowers End to End
Time to put the whole pipeline together yourself, with no scaffolding. This is the exact workflow you would use on a real problem.
You are given a small, iris-like flower dataset as a DataFrame named data. Each row has two features, petal_len and petal_wid, and a label species that is 0, 1, or 2 for three flower types. The three species occupy clearly different petal sizes, so a good model should separate them well.
Your job is the full sequence you have learned, in order:
- Split features and label into
Xandy. - Hold out a test set with
train_test_split. - Train a model on the training set.
- Predict on the test set.
- Evaluate with accuracy on that test set.
Two details for a clean, reproducible run. Pass random_state=0 everywhere randomness appears (the split and the model). And pass stratify=y to train_test_split so the test set keeps the same mix of the three species rather than, say, leaving one species out by chance, which matters when each class is small.
This is the same loop, scaled up, that runs behind spam filters, medical screens, and recommendation systems. The model here is small and the data is friendly, but the shape of the work is exactly real.
Using the given DataFrame data: set X = data[["petal_len", "petal_wid"]] and y = data["species"]. Split with train_test_split(X, y, test_size=0.25, random_state=0, stratify=y) into X_train, X_test, y_train, y_test. Train a RandomForestClassifier(n_estimators=50, random_state=0) named model on the training set, predict on X_test into predictions, and set acc = accuracy_score(y_test, predictions).
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.