To drop duplicate columns in a Pandas DataFrame, you can use the following code:
df = df.loc[:,~df.columns.duplicated()].copy()
Here is how this code works:
df.columns
attribute is used to get a list of all the columns in the DataFrame.duplicated()
method is called on this list of columns to identify any duplicate columns.~
operator is used to invert the logical values returned by the duplicated()
method. This means that the ~df.columns.duplicated()
expression will evaluate to True
for columns that are not duplicates, and False
for columns that are duplicates.loc
attribute is used to subset the DataFrame
, using the ~df.columns.duplicated()
expression as a filter to select only the columns that are not duplicates.copy()
method is called on the resulting DataFrame
to create a new DataFrame
object with the duplicate columns removed.DataFrame
is assigned to the df variable, overwriting the original DataFrame
.Use the following code to remove duplicated indexes.
df = df.loc[~df.index.duplicated(),:].copy()
df.index
attribute is used to get a list of all the indices (i.e. row labels) in the DataFrame.duplicated()
method is called on this list of indices to identify any duplicate rows.~
operator is used to invert the logical values returned by the duplicated()
method. This means that the ~df.index.duplicated()
expression will evaluate to True for rows that are not duplicates, and False for rows that are duplicates.loc
attribute is used to subset the DataFrame, using the ~df.index.duplicated()
expression as a filter to select only the rows that are not duplicates.copy()
method is called on the resulting DataFrame to create a new DataFrame object with the duplicate rows removed.Use the following code:
df = df.loc[:,~df.apply(lambda x: x.duplicated(),axis=1).all()].copy()
apply()
method is called on the DataFrame, with a lambda function as the argument. This lambda function takes a column (x
) as input and returns a boolean value indicating whether the values in that column are all duplicates of each other (using the duplicated()
method on the column).all()
method is called on the resulting Series of boolean values to determine if all the columns in the DataFrame have duplicate values.all()
method. This means that the ~df.apply(lambda x: x.duplicated(),axis=1).all()
expression will evaluate to True for columns that are not all duplicates, and False for columns that are all duplicates.~df.apply(lambda x: x.duplicated(),axis=1).all()
expression as a filter to select only the columns that are not all duplicates.copy()
method is called on the resulting DataFrame to create a new DataFrame object with the duplicate columns removed.Related tutorials curated for you
How to use ewm() in Pandas
How to use nunique() in Pandas
How to convert Pandas timestamp to datetime
How to drop duplicate columns in Pandas
How to split a Pandas DataFrame by a column value
How to find the minimum in Pandas
How to print a specific row in a Pandas DataFrame?
How to normalize a column in Pandas
How to create a bar chart in Pandas
What is .notnull in Pandas?
How to read a TSV file in Pandas
How to use qcut() in Pandas