Coding Ref

How to calculate the variance in Pandas DataFrame

How to calculate the variance in Pandas DataFrame

To calculate the variance for a column in a Pandas DataFrame, you can use the var method.

This method is applied to a Series object and returns the variance for the elements in that Series.

Example

For example, consider the following DataFrame:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

This DataFrame has three columns A, B, and C, with five rows of data.

To calculate the variance for a specific column, you could do the following:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Calculate the variance for column B
variance = df['B'].var()

# Print the resulting value
print(variance)

In the code above, the var method is applied to the B column of the DataFrame, which calculates the variance for the elements in that column.

In this case, the resulting variance is 250.

Calculate the variance of multiple columns

You can also specify multiple columns when using the var method.

For example, if you wanted to calculate the variance for both columns B and C, you could do the following:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Calculate the variance for columns B and C
variance = df[['B', 'C']].var()

# Print the resulting Series
print(variance)
output
B      250.0
C    25000.0
dtype: float64

In the code above, the var method is applied to the B and C columns of the DataFrame, which calculates the variance for the elements in those columns. The result is a new Series object containing the variance for each column.

Calculate the variance of the entire DataFrame

You can also use the var method to calculate the variance for the entire DataFrame.

To do this, you can use the apply method in combination with the var method, as shown in the following example:

main.py
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 200, 300, 400, 500]
})

# Calculate the variance for the entire DataFrame
variance = df.apply(lambda x: x.var())

# Print the resulting Series
print(variance)
output
A        2.5
B      250.0
C    25000.0
dtype: float64

In the code above, the apply method is used to apply the var method to each column in the DataFrame.

You'll also like

Related tutorials curated for you

    How to fix: AttributeError module 'pandas' has no attribute 'dataframe'

    How to change the order of columns in Pandas

    What is insert() in Pandas?

    How to calculate the variance in Pandas DataFrame

    fillna() in Pandas

    What does factorize() do in Pandas?

    How to use str.split() in Pandas

    How to convert a series to a list in Pandas

    How to sort a series in Pandas

    How to reset index in a Pandas DataFrame

    What does Diff() do in Pandas?

    What is isna() in Pandas?