Coding Ref

What is categorical data in Pandas?

What is categorical data in Pandas?

In Pandas, categorical data is used to store data that belong to a limited number of categories or classes. Categorical data is often used to store data that can be labeled as "Yes/No", "True/False", "Male/Female", "High/Medium/Low", and so on.

Here's an example of using categorical data in Pandas:

main.py
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({"ID": [1, 2, 3, 4, 5],
                   "Gender": ["Male", "Female", "Male", "Female", "Male"],
                   "IsMarried": ["Yes", "No", "Yes", "No", "No"]})

# convert the Gender and IsMarried columns to categorical data
df["Gender"] = df["Gender"].astype('category')
df["IsMarried"] = df["IsMarried"].astype('category')

# display the dataframe
print(df)

This will create a dataframe with two columns, Gender and IsMarried, containing categorical data.

The output will be:

output
   ID  Gender IsMarried
0   1    Male       Yes
1   2  Female        No
2   3    Male       Yes
3   4  Female        No
4   5    Male        No

Categorical data has several advantages over other data types.

  1. First, it takes up less memory compared to other data types, because it stores the categories as integer codes rather than strings.
  2. Second, it allows for faster operations, because integer codes are faster to process than strings.
  3. Third, it allows for more intuitive comparisons and operations between categories, because the categories are mapped to integer codes in a specific order.

For example, you can sort the dataframe by the Gender column like this:

main.py
# sort the dataframe by the Gender column
df = df.sort_values(by="Gender")

# display the sorted dataframe
print(df)

This will sort the dataframe in ascending order by the Gender column, with the "Female" category appearing first. The output will be:

output
   ID  Gender IsMarried
1   2  Female        No
3   4  Female        No
0   1    Male       Yes
2   3    Male       Yes
4   5    Male        No

You'll also like

Related tutorials curated for you

    What does Count() do in Pandas?

    How to convert string to float in Pandas

    How to print a specific row in a Pandas DataFrame?

    How to select multiple columns in Pandas

    How to use str.contains() in Pandas

    How to create a bar chart in Pandas

    What is date_range() in Pandas?

    How to find the minimum in Pandas

    How to read a TSV file in Pandas

    How to sort by two columns in Pandas

    How to calculate the variance in Pandas DataFrame

    How to change the order of columns in Pandas