Coding Ref

How to reshape a Pandas DataFrame

How to reshape a Pandas DataFrame

To reshape a Pandas DataFrame, you can use the pivot() and pivot_table() functions.

These functions allow you to rearrange the data in a DataFrame, by creating new columns, rows, or levels based on the values in the original DataFrame.

Using pivot()

The pivot() method allows you to reshape a DataFrame by transforming the data into a new format, with different columns for each unique value in a specified column.

main.py
import pandas as pd

df = pd.DataFrame({'A': ['apple', 'banana', 'orange'],
                   'B': ['red', 'yellow', 'orange'],
                   'C': [1, 2, 3]})

# display the DataFrame
output
	      A     	B	    C
0	  apple	    red	    1
1	 banana	 yellow	    2
2	 orange	 orange	    3

This dataframe has three columns: A, B, and C. The pivot() method allows you to reshape the dataframe by transforming the data into a new format, with different columns for each unique value in a specified column.

For example, suppose we want to create a new DataFrame with columns A and B as the index, and C as the values.

We can use the pivot() method as follows:

main.py
# reshape the dataframe using the pivot() method
df_pivot = df.pivot(index='A', columns='B', values='C')

# display the resulting dataframe
df_pivot

This code will produce the following output:

output
B       red  yellow  orange
A
apple     1     NaN     NaN
banana  NaN       2     NaN
orange  NaN     NaN       3

As you can see, the pivot() method has transformed the data into a new format, with columns B as the index and C as the values.

The resulting dataframe has one column for each unique value in column B, and the values in column C are placed in the appropriate column based on the corresponding value in column B.

Using pivot_table()

For example, suppose we want to create a new dataframe with columns A and B as the index, and the sum of C as the values.

We can use the pivot_table() method as follows:

main.py
import pandas as pd

df = pd.DataFrame({'A': ['apple', 'banana', 'orange'],
                   'B': ['red', 'yellow', 'orange'],
                   'C': [1, 2, 3]})

# reshape the dataframe using the pivot_table() method
df_pivot_table = df.pivot_table(index='A', columns='B', values='C', aggfunc=sum)

# display the resulting dataframe
df_pivot_table
output
B       red  yellow  orange
A
apple     1     NaN     NaN
banana  NaN       2     NaN
orange  NaN     NaN       3

The pivot_table() method has aggregated the data using the sum() function, and has transformed the data into a new format with columns B as the index and the sum of C as the values.

The resulting dataframe has one column for each unique value in column B, and the sum of the values in column C are placed in the appropriate column based on the corresponding value in column B.

You'll also like

Related tutorials curated for you

    How to GroupBy Index in Pandas

    What is idxmax() in Pandas?

    How to find the minimum in Pandas

    How to change the order of columns in Pandas

    How to use str.split() in Pandas

    What is nlargest() in Pandas?

    How to read a TSV file in Pandas

    How to use ewm() in Pandas

    How to split a Pandas DataFrame by a column value

    How to convert Pandas timestamp to datetime

    How to drop duplicate columns in Pandas

    How to give multiple conditions in loc() in Pandas