Syllabus Lesson 81 of 239 · Data Foundations: numpy & pandas
Data Foundations: numpy & pandas

Project: Analyze a CSV

Time to put it together. You will write a tiny CSV file in memory, read it back with pandas, and compute grouped aggregates, the way a real analysis starts.

pandas reads CSV text from a file or from an in-memory buffer. We use io.StringIO so nothing touches disk:

import io
import pandas as pd
csv_text = "name,dept,salary\nAda,eng,100\nBo,eng,140\n"
df = pd.read_csv(io.StringIO(csv_text))
print(df)

From there you already know the moves: filter with a mask, group with groupby(...).agg(...), pull a single value out of a Series with .loc or by indexing its label.

Your job: read the data, then build a single results dict that summarises it. Returning a plain dict is a common, testable way to hand analysis output to the rest of a program (or to a report, or to an LLM prompt). We grade the numbers in that dict, not any printout.

What goes in results

  • "n_rows": the number of rows in the table
  • "total_salary": the sum of the whole salary column
  • "avg_by_dept": a dict mapping each dept to its average salary (build it from a groupby mean, then .to_dict())
  • "top_dept": the dept with the highest average salary
Your turn

The CSV text is provided in csv_text. Read it into a DataFrame df with pd.read_csv(io.StringIO(csv_text)). Then build a dict named results with these keys: "n_rows" (row count), "total_salary" (sum of the salary column), "avg_by_dept" (a dict of dept to mean salary), and "top_dept" (the dept whose average salary is highest).

Spotted a problem in this lesson? Report it

Code · runs in your browser
Output