Grouping and Merging
Two operations unlock most data analysis: groupby (split rows into groups and summarise each) and merge (join two tables on a shared key).
groupby follows a split-apply-combine pattern. Group by a column, then aggregate:
import pandas as pd
sales = pd.DataFrame({
"region": ["N", "S", "N", "S"],
"amount": [100, 200, 150, 50],
})
print(sales.groupby("region")["amount"].sum())
# region
# N 250
# S 250Use .agg to compute several summaries at once:
print(sales.groupby("region")["amount"].agg(["sum", "mean", "count"]))merge joins tables that share a column. Here both tables have region:
info = pd.DataFrame({
"region": ["N", "S"],
"manager": ["Ada", "Bo"],
})
joined = sales.merge(info, on="region")
print(joined) # each sales row gains its region's managerThe default is an inner join: only keys present in both tables survive. Pass how="left" to keep every row of the left table even when there is no match.
Two DataFrames are given. First, group orders by customer and sum the total column, storing the result Series in by_customer. Second, merge orders with customers on the customer column (inner join) and store the merged DataFrame in merged.
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.