A crosstab, also known as a contingency table or cross-tabulation, is a table that displays the frequency or count of occurrences for two or more variables. In Pandas, you can create a crosstab using the pd.crosstab()
function.
Here's an example:
import pandas as pd
# create a sample dataframe
df = pd.DataFrame({"Gender": ["Male", "Female", "Male", "Female", "Male"],
"IsMarried": ["Yes", "No", "Yes", "No", "No"]})
# create a crosstab
ct = pd.crosstab(df["Gender"], df["IsMarried"])
# display the crosstab
print(ct)
This will create a crosstab that shows the frequency of each combination of Gender
and IsMarried
values in the dataframe. The output will be:
IsMarried No Yes
Gender
Female 2 0
Male 1 2
You can also use the margins
parameter to include the row and column totals in the crosstab. For example:
# create a crosstab with margins
ct = pd.crosstab(df["Gender"], df["IsMarried"], margins=True)
# display the crosstab
print(ct)
This will add a row and a column at the bottom and right of the crosstab, respectively, showing the row and column totals.
The output will be:
IsMarried No Yes All
Gender
Female 2 0 2
Male 1 2 3
All 3 2 5
You can also use the normalize
parameter to show the relative frequencies or percentages instead of the raw counts.
For example:
# create a crosstab with relative frequencies
ct = pd.crosstab(df["Gender"], df["IsMarried"], normalize="index")
# display the crosstab
print(ct)
This will show the relative frequencies of each combination of Gender
and IsMarried
values, where the rows are normalized by the total count for each Gender
value.
The output will be:
IsMarried No Yes
Gender
Female 1.000000 0.000000
Male 0.333333 0.666667
You can also use the normalize
parameter to show the relative frequencies or percentages as columns.
For example:
# create a crosstab with relative frequencies
ct = pd.crosstab(df["Gender"], df["IsMarried"], normalize="columns")
# display the crosstab
print(ct)
This will show the relative frequencies of each combination of Gender
and IsMarried
values, where the columns are normalized by the total count for each IsMarried
value.
The output will be:
IsMarried No Yes
Gender
Female 0.666667 0.0
Male 0.333333 1.0
Related tutorials curated for you
How to use astype() in Pandas
How to apply a function to multiple columns in Pandas
How to join two DataFrames in Pandas
How to use ewm() in Pandas
How to get the number of columns in a Pandas DataFrame
How to sort by two columns in Pandas
How to groupby, then sort within groups in Pandas
How to select multiple columns in Pandas
How to create a freqeuncy table in Pandas
How to shuffle data in Pandas
How to reset index in a Pandas DataFrame
How to add an empty column to a Pandas DataFrame