In data analytics, you often need to summarize data — like total sales per region or average order value per customer. Pandas makes this easy with .groupby()
and aggregation functions.
To group data by a specific column and calculate the sum:
df.groupby("region")["sales"].sum()
This groups the data by region and then sums the sales for each group.
.sum()
– total value.mean()
– average.count()
– number of entries.max()
/ .min()
– highest or lowestYou can apply multiple aggregations like this:
df.groupby("region")["sales"].agg(["sum", "mean", "count"])
Grouped results have a special index. To turn it back into a normal DataFrame:
grouped = df.groupby("region")["sales"].sum().reset_index()
Grouping and aggregating help you uncover patterns, compare performance, and summarize large datasets — all essential skills for a data analyst.