DataDucky

Why Group Data?

In data analytics, you often need to summarize data — like total sales per region or average order value per customer. Pandas makes this easy with .groupby() and aggregation functions.

Grouping Data

To group data by a specific column and calculate the sum:

df.groupby("region")["sales"].sum()

This groups the data by region and then sums the sales for each group.

Common Aggregation Functions

.sum() – total value
.mean() – average
.count() – number of entries
.max() / .min() – highest or lowest

You can apply multiple aggregations like this:

df.groupby("region")["sales"].agg(["sum", "mean", "count"])

Resetting the Index

Grouped results have a special index. To turn it back into a normal DataFrame:

grouped = df.groupby("region")["sales"].sum().reset_index()

Why This Matters in Analytics

Grouping and aggregating help you uncover patterns, compare performance, and summarize large datasets — all essential skills for a data analyst.

‍

Course Navigation

Your Turn

Why Group Data?

Grouping Data

Common Aggregation Functions

Resetting the Index

Why This Matters in Analytics