To drop duplicate rows in Pandas, you can use the drop_duplicates()
function. This function will return a new dataframe with the duplicate rows removed.
Here's an example of using the drop_duplicates()
function in Pandas:
import pandas as pd
# create a sample dataframe
df = pd.DataFrame({"A": [1, 2, 3, 2, 1],
"B": [4, 5, 6, 5, 4],
"C": [7, 8, 9, 8, 7]})
# drop duplicate rows from the dataframe
df_unique = df.drop_duplicates()
# display the result
print(df_unique)
This will drop all the rows that have duplicate values in one or more columns and return a new dataframe without the duplicates. The output will be:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
You can also specify one or more columns to use as the criteria for identifying duplicates, using the subset
parameter. For example:
# drop duplicate rows based on the A and B columns
df_unique = df.drop_duplicates(subset=["A", "B"])
# display the result
print(df_unique)
This will drop all the rows that have duplicate values in the A
and B
columns and return a new dataframe without the duplicates. The output will be:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
As you can see, the drop_duplicates()
function is useful for removing duplicate rows from a dataframe, based on the values in one or more columns.
Related tutorials curated for you
How to convert a Pandas Index to a List
How to use where() in Pandas
What is isna() in Pandas?
How to reshape a Pandas DataFrame
Pandas read SQL
How to create a bar chart in Pandas
How to join two DataFrames in Pandas
How to add an empty column to a Pandas DataFrame
What is .notnull in Pandas?
How to normalize a column in Pandas
How to use intertuples() in Pandas
How to use Timedelta in Pandas