In Pandas, categorical data is used to store data that belong to a limited number of categories or classes. Categorical data is often used to store data that can be labeled as "Yes/No", "True/False", "Male/Female", "High/Medium/Low", and so on.
Here's an example of using categorical data in Pandas:
import pandas as pd
# create a sample dataframe
df = pd.DataFrame({"ID": [1, 2, 3, 4, 5],
"Gender": ["Male", "Female", "Male", "Female", "Male"],
"IsMarried": ["Yes", "No", "Yes", "No", "No"]})
# convert the Gender and IsMarried columns to categorical data
df["Gender"] = df["Gender"].astype('category')
df["IsMarried"] = df["IsMarried"].astype('category')
# display the dataframe
print(df)
This will create a dataframe with two columns, Gender
and IsMarried
, containing categorical data.
The output will be:
ID Gender IsMarried
0 1 Male Yes
1 2 Female No
2 3 Male Yes
3 4 Female No
4 5 Male No
Categorical data has several advantages over other data types.
For example, you can sort the dataframe by the Gender
column like this:
# sort the dataframe by the Gender column
df = df.sort_values(by="Gender")
# display the sorted dataframe
print(df)
This will sort the dataframe in ascending order by the Gender
column, with the "Female" category appearing first. The output will be:
ID Gender IsMarried
1 2 Female No
3 4 Female No
0 1 Male Yes
2 3 Male Yes
4 5 Male No
Related tutorials curated for you
What does Count() do in Pandas?
How to convert string to float in Pandas
How to print a specific row in a Pandas DataFrame?
How to select multiple columns in Pandas
How to use str.contains() in Pandas
How to create a bar chart in Pandas
What is date_range() in Pandas?
How to find the minimum in Pandas
How to read a TSV file in Pandas
How to sort by two columns in Pandas
How to calculate the variance in Pandas DataFrame
How to change the order of columns in Pandas