Coding Ref

How to normalize a column in Pandas

How to normalize a column in Pandas

To normalize a column in Pandas, you can use the apply method to apply a normalization function to the column. This method allows you to apply a function to each element in the column, and return a new Series object containing the normalized values.

Example

For example, consider the following DataFrame:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

df
output
   A   B    C
0  1  10  100
1  2  20  200
2  3  30  300
3  4  40  400
4  5  50  500

This DataFrame has three columns A, B, and C, with five rows of data.

To normalize the values in a specific column, you could do the following:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Define a normalization function
def normalize(x):
    return (x - x.min()) / (x.max() - x.min())

# Apply the normalization function to columns B and C
normalized = df[['B']].apply(normalize)

# Print the resulting DataFrame
print(normalized)
output
      B
0  0.00
1  0.25
2  0.50
3  0.75
4  1.00

In the code above, a normalization function is defined that takes a Series object as an argument, and returns a new Series object containing the normalized values.

This function calculates the minimum and maximum values for the Series, and then applies the normalization formula (x - x.min()) / (x.max() - x.min()) to each element in the Series.

The normalize function is then applied to the B column of the DataFrame using the apply method. This applies the normalization function to each element in the B column, and returns a new Series object containing the normalized values. In this case, the resulting Series has the values 0.0, 0.25, 0.5, 0.75, and 1.0.

You can also specify multiple columns when using the apply method to normalize data in a Pandas DataFrame.

For example, if you wanted to normalize both columns B and C, you could do the following:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Define a normalization function
def normalize(x):
    return (x - x.min()) / (x.max() - x.min())

# Apply the normalization function to columns B and C
normalized = df[['B', 'C']].apply(normalize)

# Print the resulting DataFrame
print(normalized)
output
      B     C
0  0.00  0.00
1  0.25  0.25
2  0.50  0.50
3  0.75  0.75
4  1.00  1.00

In the code above, the normalization function is applied to the B and C columns of the DataFrame using the apply method.

You'll also like

Related tutorials curated for you

    How to shuffle data in Pandas

    How to change the order of columns in Pandas

    How to groupby mean in Pnadas

    How to use ffill() in Pandas

    How to round in Pandas

    How to use Timedelta in Pandas

    How to apply a function to multiple columns in Pandas

    How to groupby, then sort within groups in Pandas

    How to normalize a column in Pandas

    How to select multiple columns in Pandas

    How to calculate the standard deviation in Pandas DataFrame

    How to write a Pandas DataFrame to SQL