Joining two DataFrames in Pandas is a common operation when working with data in Python.
In Pandas, there are several ways to join two DataFrames, depending on the type of join and the data you want to include in the resulting DataFrame.
We will go over the different ways to join two DataFrames in Pandas, including how to customize the join and add labels and annotations.
merge
To join two DataFrames in Pandas, we can use the merge()
function. This function takes a number of optional parameters that allow us to specify the type of join, the columns to join on, and how to handle any conflicts or missing data.
Here is an example of how to use the merge()
function to join two DataFrames:
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 8, 9], 'D': [10, 11, 12]})
# Use merge() to join the DataFrames
df3 = pd.merge(df1, df2)
# view the merged DataFrame
print(df3)
A B D
0 1 4 10
This will create a new DataFrame, df3
, that contains the data from both df1
and df2
.
By default, the merge()
function will use an inner join, which means that only the rows that appear in both df1
and df2
will be included in the resulting DataFrame.
how
parameter of merge
To customize the join, we can use the how
parameter of the merge()
function.
This parameter takes one of the following values: 'inner'
, 'outer'
, 'left'
, or 'right'
.
'inner'
value specifies an inner join, as we saw in the previous example.'outer'
value specifies an outer join, which includes all the rows from both df1
and df2
, even if they don't match up.'left'
value specifies a left join, which includes all the rows from df1
and only the matching rows from df2
.'right'
value specifies a right join, which includes all the rows from df2
and only the matching rows from df1
.Here is an example of how to use the how
parameter to specify a different type of join:
# Use merge() to join the DataFrames with an outer join
df4 = pd.merge(df1, df2, how='outer')
A B D
0 1 4.0 10.0
1 2 5.0 NaN
2 3 6.0 NaN
3 8 NaN 11.0
4 9 NaN 12.0
on
parameter of merge
In addition to specifying the type of join, we can also use the on
parameter of the merge()
function to specify the columns to join on.
This parameter takes a column name or a list of column names, and the join will be performed using these columns.
Here is an example of how to use the on
parameter to specify the columns to join on:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 8, 9], 'B': [4, 5, 8], 'D': [10, 11, 12]})
# Use merge() to join the DataFrames on columns 'A' and 'B'
df5 = pd.merge(df1, df2, on=['A', 'B'])
print(df5)
A B D
0 1 4 10
Related tutorials curated for you
How to select multiple columns in Pandas
How to create a bar chart in Pandas
How to drop duplicate rows in Pandas
How to sort a series in Pandas
How to use ffill() in Pandas
How to find the minimum in Pandas
How to create a freqeuncy table in Pandas
How to use intertuples() in Pandas
What is .notnull in Pandas?
How to normalize a column in Pandas
How to get the first row in Pandas
How to groupby, then sort within groups in Pandas