Coding Ref

How to shuffle data in Pandas

How to shuffle data in Pandas

Shuffle rows using sample

To shuffle a Pandas DataFrame, you can use the sample method. This method will randomly shuffle the rows of the DataFrame.

For example, to shuffle the DataFrame df, you could do the following:

main.py
import pandas as pd

# Shuffle the DataFrame
df = df.sample(frac=1)

In the code above, the sample method is applied to the DataFrame with the frac parameter set to 1.

This tells the sample method to return a DataFrame with the same number of rows as the original DataFrame, but with the rows in a random order.

Shuffle rows using shuffle method from sklearn.utils

You can also use the shuffle method from the sklearn.utils module to shuffle the rows of a DataFrame.

This method takes the DataFrame as input and returns a new DataFrame with the rows in a random order. For example:

main.py
from sklearn.utils import shuffle

import pandas as pd

# Shuffle the DataFrame
df = shuffle(df)

In the code above, the shuffle method is used to shuffle the rows of the DataFrame df. This method will return a new DataFrame with the rows in a random order.

Both of these methods will shuffle the rows of a DataFrame, but they will not shuffle the columns.

Shuffle columns

If you want to shuffle the columns of a DataFrame, you can use the numpy.random.permutation method to generate a list of shuffled column names, and then use this list to reorder the columns of the DataFrame.

For example:

main.py
import numpy as np
import pandas as pd

# Get the column names of the DataFrame
columns = df.columns

# Use numpy to randomly permute the column names
columns = np.random.permutation(columns)

# Use the shuffled column names to reorder the columns of the DataFrame
df = df[columns]

In the code above, the numpy.random.permutation method is used to generate a list of shuffled column names.

This list is then used to reorder the columns of the DataFrame. This will result in a DataFrame with the columns in a random order.

You'll also like

Related tutorials curated for you

    How to convert string to float in Pandas

    How to drop an index column in Pandas

    How to sort by two columns in Pandas

    What does factorize() do in Pandas?

    How to calculate the variance in Pandas DataFrame

    How to calculate covariance in Pandas

    How to use Timedelta in Pandas

    How to create a bar chart in Pandas

    What is idxmax() in Pandas?

    How to drop duplicate rows in Pandas

    How to change the order of columns in Pandas

    How to stack two Pandas DataFrames