Coding Ref

How to calculate the standard deviation in Pandas DataFrame

How to calculate the standard deviation in Pandas DataFrame

To calculate the standard deviation for a column in a Pandas DataFrame, you can use the std method.

This method is applied to a Series object and returns the standard deviation for the elements in that Series.

Example

For example, consider the following DataFrame:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

This DataFrame has three columns A, B, and C, with five rows of data.

To calculate the standard deviation for a specific column, you could do the following:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Calculate the standard deviation for column B
std = df['B'].std()

# Print the resulting value
print(std)
output
15.811388300841896

In the code above, the std method is applied to the B column of the DataFrame, which calculates the standard deviation for the elements in that column.

In this case, the resulting standard deviation is 15.811388300841896.

Calculate the standard deviation of multiple columns

You can also specify multiple columns when using the std method.

For example, if you wanted to calculate the standard deviation for both columns B and C, you could do the following:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Calculate the standard deviation for columns B and C
std = df[['B', 'C']].std()

# Print the resulting Series
print(std)
output
B     15.811388
C    158.113883
dtype: float64

In the code above, the std method is applied to the B and C columns of the DataFrame, which calculates the standard deviation for the elements in those columns.

The result is a new Series object containing the standard deviation for each column.

Calculate the standard deviation for the entire DataFrame

You can also use the std method to calculate the standard deviation for the entire DataFrame.

To do this, you can use the apply method in combination with the std method, as shown in the following example:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Calculate the standard deviation for the entire DataFrame
std = df.apply(lambda x: x.std())

# Print the resulting Series
print(std)
output
A      1.581139
B     15.811388
C    158.113883
dtype: float64

You'll also like

Related tutorials curated for you

    How to calculate covariance in Pandas

    How to reshape a Pandas DataFrame

    How to calculate the standard deviation in Pandas DataFrame

    How to get the number of columns in a Pandas DataFrame

    How to drop an index column in Pandas

    How to use nunique() in Pandas

    How to drop duplicate rows in Pandas

    How to split a Pandas DataFrame by a column value

    How to select multiple columns in Pandas

    How to change the order of columns in Pandas

    How to sort by two columns in Pandas

    How to use Timedelta in Pandas