The astype()
function in Pandas is used to change the data type of one or more columns in a dataframe.
Here's an example:
import pandas as pd
# create a sample dataframe
df = pd.DataFrame([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# change the data type of the 2nd column to float
df[1] = df[1].astype(float)
# check the data types of the columns
print(df.dtypes)
In this example, the data type of the 2nd column (index 1) is changed to float
, and the dtypes
attribute is used to check the data types of all the columns in the dataframe. The output will be:
0 int64
1 float64
2 int64
dtype: object
You can also use astype()
to change multiple columns at once by passing a dictionary to the columns
parameter. The keys of the dictionary should be the column names, and the values should be the desired data types.
For example:
# change the data type of multiple columns
df = df.astype({"0": float, "2": str})
# check the data types of the columns
print(df.dtypes)
This will change the data type of the 1st and 3rd columns (index 0 and 2) to float
and str
, respectively.
The output will be:
0 float64
1 int64
2 object
dtype: object
You can also use astype()
to change the data type of the entire DataFrame by calling it on the dataframe itself, rather than a column.
This will change the data type of all the columns to the specified data type:
# change the data type of the entire DataFrame
df = df.astype(int)
# check the data types of the columns
print(df.dtypes)
In this case, the output will be:
0 int64
1 int64
2 int64
dtype: object
If you use the astype()
method in pandas to convert a column to a data type that is not compatible with the values in that column, you will get a ValueError exception.
This is because the astype()
method can only convert a column to a data type that is compatible with the values in the column.
For example, trying to convert a column of strings to int
will raise a ValueError
if any of the strings cannot be converted to an integer.
You can use the errors
parameter to specify how to handle such errors.
Possible values for errors
are 'raise'
(the default), 'ignore'
, and 'coerce'
.
The 'ignore'
option will simply leave the values that cannot be converted unchanged, while the 'coerce'
option will replace such values with NaN
(missing values).
For example:
# create a sample DataFrame
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], ["a", "b", "c"]])
# try to convert a column of strings to int
df[2] = df[2].astype(int, errors='ignore')
# check the data types of the columns
print(df.dtypes)
In this case, the output will be:
0 object
1 object
2 object
dtype: object
Related tutorials curated for you
What is idxmax() in Pandas?
How to get the number of columns in a Pandas DataFrame
How to convert a Pandas Index to a List
How to GroupBy Index in Pandas
How to use where() in Pandas
How to make a crosstab in Pandas
How to reorder columns in Pandas
How to apply a function to multiple columns in Pandas
What is date_range() in Pandas?
What does Head() do in Pandas?
How to read a TSV file in Pandas
How to use ewm() in Pandas