Data Foundations: numpy & pandas

Selecting, Filtering and Sorting

There are two label-aware selectors. .loc selects by label (column names, index values); .iloc selects by integer position.

import pandas as pd
df = pd.DataFrame({
    "name": ["Ada", "Bo", "Cy"],
    "age":  [30, 25, 35],
})
print(df.loc[0, "name"])   # Ada    (row label 0, column 'name')
print(df.iloc[0])          # the first row by position
print(df.iloc[:, 1])       # the second column by position

The workhorse is the boolean mask. Write a condition on a column to get a True/False Series, then index the DataFrame with it to keep only the True rows:

mask = df["age"] >= 30
print(df[mask])            # only Ada and Cy
# or in one line:
print(df[df["age"] >= 30])

Combine conditions with & (and) and | (or). Wrap each condition in parentheses:

print(df[(df["age"] >= 26) & (df["age"] <= 34)])   # just Ada

Sort with sort_values. Use ascending=False for largest first:

print(df.sort_values("age", ascending=False))   # Cy, Ada, Bo

Your turn

Given the DataFrame below (already built for you), keep only the rows where price is greater than 20 and store the result in pricey. Then sort pricey by price from highest to lowest, storing it in pricey_sorted.

Spotted a problem in this lesson? Report it

Code · runs in your browser

Output

Back Next lesson

Selecting, Filtering and Sorting

This lesson is locked

Best on a laptop