To calculate the variance for a column in a Pandas DataFrame, you can use the var
method.
This method is applied to a Series
object and returns the variance for the elements in that Series
.
For example, consider the following DataFrame:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
This DataFrame has three columns A
, B
, and C
, with five rows of data.
To calculate the variance for a specific column, you could do the following:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
# Calculate the variance for column B
variance = df['B'].var()
# Print the resulting value
print(variance)
In the code above, the var
method is applied to the B
column of the DataFrame, which calculates the variance for the elements in that column.
In this case, the resulting variance is 250.
You can also specify multiple columns when using the var
method.
For example, if you wanted to calculate the variance for both columns B
and C
, you could do the following:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
# Calculate the variance for columns B and C
variance = df[['B', 'C']].var()
# Print the resulting Series
print(variance)
B 250.0
C 25000.0
dtype: float64
In the code above, the var
method is applied to the B
and C
columns of the DataFrame, which calculates the variance for the elements in those columns. The result is a new Series
object containing the variance for each column.
You can also use the var
method to calculate the variance for the entire DataFrame.
To do this, you can use the apply
method in combination with the var
method, as shown in the following example:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
})
# Calculate the variance for the entire DataFrame
variance = df.apply(lambda x: x.var())
# Print the resulting Series
print(variance)
A 2.5
B 250.0
C 25000.0
dtype: float64
In the code above, the apply
method is used to apply the var
method to each column in the DataFrame.
Related tutorials curated for you
How to fix: AttributeError module 'pandas' has no attribute 'dataframe'
How to change the order of columns in Pandas
What is insert() in Pandas?
How to calculate the variance in Pandas DataFrame
fillna() in Pandas
What does factorize() do in Pandas?
How to use str.split() in Pandas
How to convert a series to a list in Pandas
How to sort a series in Pandas
How to reset index in a Pandas DataFrame
What does Diff() do in Pandas?
What is isna() in Pandas?