To normalize a column in Pandas, you can use the apply
method to apply a normalization function to the column. This method allows you to apply a function to each element in the column, and return a new Series
object containing the normalized values.
For example, consider the following DataFrame:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
df
A B C
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
4 5 50 500
This DataFrame has three columns A
, B
, and C
, with five rows of data.
To normalize the values in a specific column, you could do the following:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
# Define a normalization function
def normalize(x):
return (x - x.min()) / (x.max() - x.min())
# Apply the normalization function to columns B and C
normalized = df[['B']].apply(normalize)
# Print the resulting DataFrame
print(normalized)
B
0 0.00
1 0.25
2 0.50
3 0.75
4 1.00
In the code above, a normalization function is defined that takes a Series
object as an argument, and returns a new Series
object containing the normalized values.
This function calculates the minimum and maximum values for the Series
, and then applies the normalization formula (x - x.min()) / (x.max() - x.min())
to each element in the Series
.
The normalize
function is then applied to the B
column of the DataFrame using the apply
method. This applies the normalization function to each element in the B
column, and returns a new Series
object containing the normalized values. In this case, the resulting Series
has the values 0.0, 0.25, 0.5, 0.75, and 1.0.
You can also specify multiple columns when using the apply
method to normalize data in a Pandas DataFrame.
For example, if you wanted to normalize both columns B
and C
, you could do the following:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
# Define a normalization function
def normalize(x):
return (x - x.min()) / (x.max() - x.min())
# Apply the normalization function to columns B and C
normalized = df[['B', 'C']].apply(normalize)
# Print the resulting DataFrame
print(normalized)
B C
0 0.00 0.00
1 0.25 0.25
2 0.50 0.50
3 0.75 0.75
4 1.00 1.00
In the code above, the normalization function is applied to the B
and C
columns of the DataFrame using the apply
method.
Related tutorials curated for you
How to shuffle data in Pandas
How to change the order of columns in Pandas
How to groupby mean in Pnadas
How to use ffill() in Pandas
How to round in Pandas
How to use Timedelta in Pandas
How to apply a function to multiple columns in Pandas
How to groupby, then sort within groups in Pandas
How to normalize a column in Pandas
How to select multiple columns in Pandas
How to calculate the standard deviation in Pandas DataFrame
How to write a Pandas DataFrame to SQL