Coding Ref

How to GroupBy Index in Pandas

How to GroupBy Index in Pandas

To group data by the values in the index in a Pandas DataFrame, you can use the groupby method.

This method takes the index as an argument and groups the data in the DataFrame by the values in the index.

Example

For example, consider the following DataFrame:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

This DataFrame has three columns A, B, and C, with five rows of data.

To group the data in this DataFrame by the values in the index, you could do the following:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Group the data by the values in the index
grouped = df.groupby(level=0)

# Print the resulting DataFrameGroupBy object
print(grouped)

In the code above, the groupby method is applied to the DataFrame, and the level argument is used to specify that the index should be used for grouping.

This creates a DataFrameGroupBy object that can be used to perform various operations on the grouped data.

Apply a function to each group

You can then use the apply method to apply a function to each group in the DataFrameGroupBy object.

For example, you could use the apply method to calculate the mean for each group, as shown in the following example:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Group the data by the values in the index
grouped = df.groupby(level=0)

# Calculate the mean for each group
mean = grouped.apply(lambda x: x.mean())

# Print the resulting DataFrame
print(mean)
output
     A     B      C
0  1.0  10.0  100.0
1  2.0  20.0  200.0
2  3.0  30.0  300.0
3  4.0  40.0  400.0
4  5.0  50.0  500.0

In the code above, the apply method is used to apply a function to each group in the DataFrameGroupBy object.

This function calculates the mean for each group, and returns a new DataFrame object containing the mean values. In this case, the resulting DataFrame has three columns with the mean values for each group.

Group data by multiple levels of the index

You can also use the groupby method to group data by multiple levels of the index.

For example, if you had a MultiIndex index with two levels, you could group the data by the values in both levels of the index, as shown in the following example:

main.py
import pandas as pd

# create a sample dataframe with a MultiIndex
df = pd.DataFrame({'A': ['a', 'a', 'b', 'b'],
                   'B': ['c', 'd', 'c', 'd'],
                   'C': [1, 2, 3, 4]})
df = df.set_index(['A', 'B'])

# group by levels A and B
grouped = df.groupby(level=['A', 'B'])

# apply a function to the groups
result = grouped.apply(lambda x: x.sum())

# display the result
print(result)
output
     C
A B
a c  1
  d  2
b c  3
  d  4

This will group the data by both levels of the index and apply the sum function to each group. The resulting dataframe will have the same MultiIndex with the groups as the index.

You'll also like

Related tutorials curated for you

    How to use ffill() in Pandas

    How to use astype() in Pandas

    How to get the first row in Pandas

    How to join two DataFrames in Pandas

    What is date_range() in Pandas?

    How to select multiple columns in Pandas

    How to sort a series in Pandas

    How to find the minimum in Pandas

    How to use nunique() in Pandas

    How to calculate the standard deviation in Pandas DataFrame

    How to print a specific row in a Pandas DataFrame?

    How to split a Pandas DataFrame by a column value