Coding Ref

How to make a crosstab in Pandas

How to make a crosstab in Pandas

A crosstab, also known as a contingency table or cross-tabulation, is a table that displays the frequency or count of occurrences for two or more variables. In Pandas, you can create a crosstab using the pd.crosstab() function.

Here's an example:

main.py
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({"Gender": ["Male", "Female", "Male", "Female", "Male"],
                   "IsMarried": ["Yes", "No", "Yes", "No", "No"]})

# create a crosstab
ct = pd.crosstab(df["Gender"], df["IsMarried"])

# display the crosstab
print(ct)

This will create a crosstab that shows the frequency of each combination of Gender and IsMarried values in the dataframe. The output will be:

output
IsMarried  No  Yes
Gender
Female     2    0
Male       1    2

You can also use the margins parameter to include the row and column totals in the crosstab. For example:

main.py
# create a crosstab with margins
ct = pd.crosstab(df["Gender"], df["IsMarried"], margins=True)

# display the crosstab
print(ct)

This will add a row and a column at the bottom and right of the crosstab, respectively, showing the row and column totals.

The output will be:

output
IsMarried  No  Yes  All
Gender
Female      2    0    2
Male        1    2    3
All         3    2    5

You can also use the normalize parameter to show the relative frequencies or percentages instead of the raw counts.

For example:

main.py
# create a crosstab with relative frequencies
ct = pd.crosstab(df["Gender"], df["IsMarried"], normalize="index")

# display the crosstab
print(ct)

This will show the relative frequencies of each combination of Gender and IsMarried values, where the rows are normalized by the total count for each Gender value.

The output will be:

output
IsMarried        No       Yes
Gender
Female     1.000000  0.000000
Male       0.333333  0.666667

You can also use the normalize parameter to show the relative frequencies or percentages as columns.

For example:

main.py
# create a crosstab with relative frequencies
ct = pd.crosstab(df["Gender"], df["IsMarried"], normalize="columns")

# display the crosstab
print(ct)

This will show the relative frequencies of each combination of Gender and IsMarried values, where the columns are normalized by the total count for each IsMarried value.

The output will be:

output
IsMarried        No  Yes
Gender
Female     0.666667  0.0
Male       0.333333  1.0

You'll also like

Related tutorials curated for you

    How to use astype() in Pandas

    How to apply a function to multiple columns in Pandas

    How to join two DataFrames in Pandas

    How to use ewm() in Pandas

    How to get the number of columns in a Pandas DataFrame

    How to sort by two columns in Pandas

    How to groupby, then sort within groups in Pandas

    How to select multiple columns in Pandas

    How to create a freqeuncy table in Pandas

    How to shuffle data in Pandas

    How to reset index in a Pandas DataFrame

    How to add an empty column to a Pandas DataFrame