pandas: Series and DataFrame
numpy is great for numbers, but real data has labelled columns of mixed types: names, dates, prices. That is what pandas is for. Its main object is the DataFrame: a table with named columns and an index for rows.
The easiest way to build one is from a dict, where each key is a column name and each value is a list of cells:
import pandas as pd
df = pd.DataFrame({
"name": ["Ada", "Bo", "Cy"],
"age": [30, 25, 35],
"city": ["NYC", "LA", "NYC"],
})
print(df)A few essentials for sizing up any table:
print(df.shape) # (3, 3) rows, columns
print(df.columns) # Index(['name', 'age', 'city'], ...)
print(df.head(2)) # the first 2 rows
print(df.dtypes) # the type of each columnA single column is a Series (a 1D labelled array). Grab one with bracket notation:
ages = df["age"]
print(ages.mean()) # 30.0
print(ages.max()) # 35df.shape is a tuple (n_rows, n_cols), so df.shape[0] is the row count.
Build a DataFrame df from this dict: {"name": ["Ada", "Bo", "Cy", "Di"], "score": [88, 72, 95, 60]}. Save the number of rows into n_rows (use df.shape). Then save the mean of the score column into avg_score.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.