Data Foundations: numpy & pandas

pandas: Series and DataFrame

numpy is great for numbers, but real data has labelled columns of mixed types: names, dates, prices. That is what pandas is for. Its main object is the DataFrame: a table with named columns and an index for rows.

The easiest way to build one is from a dict, where each key is a column name and each value is a list of cells:

import pandas as pd
df = pd.DataFrame({
    "name": ["Ada", "Bo", "Cy"],
    "age":  [30, 25, 35],
    "city": ["NYC", "LA", "NYC"],
})
print(df)

A few essentials for sizing up any table:

print(df.shape)     # (3, 3)   rows, columns
print(df.columns)   # Index(['name', 'age', 'city'], ...)
print(df.head(2))   # the first 2 rows
print(df.dtypes)    # the type of each column

A single column is a Series (a 1D labelled array). Grab one with bracket notation:

ages = df["age"]
print(ages.mean())   # 30.0
print(ages.max())    # 35

df.shape is a tuple (n_rows, n_cols), so df.shape[0] is the row count.

Your turn

Build a DataFrame df from this dict: {"name": ["Ada", "Bo", "Cy", "Di"], "score": [88, 72, 95, 60]}. Save the number of rows into n_rows (use df.shape). Then save the mean of the score column into avg_score.

Spotted a problem in this lesson? Report it

Code · runs in your browser

Output

Back Next lesson

pandas: Series and DataFrame

This lesson is locked

Best on a laptop