Syllabus Lesson 83 of 239 · Your First Machine Learning Models
Your First Machine Learning Models

The Machine Learning Mental Model

Machine learning sounds mystical, but the core idea is plain: instead of writing the rules yourself, you show a program lots of examples and let it find the rules. Almost everything in this module rests on three ideas.

Features and labels

An example is split into two parts. The features are what you know going in (a house's size, number of bedrooms). The label is what you want to predict (its price). By long tradition, features are called X (a table, capital letter) and the label is called y (a single column, lowercase).

      X (features)          y (label)
  size_sqft  bedrooms        price_k
     650        1              220
     800        2              265
    1200        3              410

Supervised vs unsupervised

  • Supervised learning has labels. You know the right answer for each training example, so the model learns to map X to y. Predicting price, or sorting email into spam vs not-spam, are supervised.
  • Unsupervised learning has no labels. You just have X and ask the model to find structure, like grouping similar customers. This whole module is supervised.

The fit / predict pattern

Every scikit-learn model speaks the same two verbs. You will see this shape again and again:

model = SomeModel()
model.fit(X, y)          # learn from examples
model.predict(X_new)     # guess labels for new rows

That is the entire ritual. The hard part is never the API, it is choosing good features and honestly measuring how well the model does, which the rest of this module is about.

In this exercise you will not train anything yet. You will just do the unglamorous but essential first step: take a small table and carve it into X and y.

Your turn

A small housing DataFrame df is given with columns size_sqft, bedrooms, and price_k. Set feature_cols = ["size_sqft", "bedrooms"], build X = df[feature_cols] (the two input columns only) and y = df["price_k"] (the label). Then set n_features to the number of feature columns and n_samples to the number of rows.

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output