Coding Ref

How to drop duplicate rows in Pandas

How to drop duplicate rows in Pandas

To drop duplicate rows in Pandas, you can use the drop_duplicates() function. This function will return a new dataframe with the duplicate rows removed.

Here's an example of using the drop_duplicates() function in Pandas:

main.py
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({"A": [1, 2, 3, 2, 1],
                   "B": [4, 5, 6, 5, 4],
                   "C": [7, 8, 9, 8, 7]})

# drop duplicate rows from the dataframe
df_unique = df.drop_duplicates()

# display the result
print(df_unique)

This will drop all the rows that have duplicate values in one or more columns and return a new dataframe without the duplicates. The output will be:

output
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

You can also specify one or more columns to use as the criteria for identifying duplicates, using the subset parameter. For example:

main.py
# drop duplicate rows based on the A and B columns
df_unique = df.drop_duplicates(subset=["A", "B"])

# display the result
print(df_unique)

This will drop all the rows that have duplicate values in the A and B columns and return a new dataframe without the duplicates. The output will be:

output
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

As you can see, the drop_duplicates() function is useful for removing duplicate rows from a dataframe, based on the values in one or more columns.

You'll also like

Related tutorials curated for you

    How to convert a Pandas Index to a List

    How to use where() in Pandas

    What is isna() in Pandas?

    How to reshape a Pandas DataFrame

    Pandas read SQL

    How to create a bar chart in Pandas

    How to join two DataFrames in Pandas

    How to add an empty column to a Pandas DataFrame

    What is .notnull in Pandas?

    How to normalize a column in Pandas

    How to use intertuples() in Pandas

    How to use Timedelta in Pandas