Coding Ref

How to join two DataFrames in Pandas

How to join two DataFrames in Pandas

Joining two DataFrames in Pandas is a common operation when working with data in Python.

In Pandas, there are several ways to join two DataFrames, depending on the type of join and the data you want to include in the resulting DataFrame.

We will go over the different ways to join two DataFrames in Pandas, including how to customize the join and add labels and annotations.

Using merge

To join two DataFrames in Pandas, we can use the merge() function. This function takes a number of optional parameters that allow us to specify the type of join, the columns to join on, and how to handle any conflicts or missing data.

Here is an example of how to use the merge() function to join two DataFrames:

main.py
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 8, 9], 'D': [10, 11, 12]})

# Use merge() to join the DataFrames
df3 = pd.merge(df1, df2)

# view the merged DataFrame
print(df3)
output
	 A	 B	 D
0	 1	 4	10

This will create a new DataFrame, df3, that contains the data from both df1 and df2.

By default, the merge() function will use an inner join, which means that only the rows that appear in both df1 and df2 will be included in the resulting DataFrame.

Using the how parameter of merge

To customize the join, we can use the how parameter of the merge() function.

This parameter takes one of the following values: 'inner', 'outer', 'left', or 'right'.

  1. The 'inner' value specifies an inner join, as we saw in the previous example.
  2. The 'outer' value specifies an outer join, which includes all the rows from both df1 and df2, even if they don't match up.
  3. The 'left' value specifies a left join, which includes all the rows from df1 and only the matching rows from df2.
  4. The 'right' value specifies a right join, which includes all the rows from df2 and only the matching rows from df1.

Here is an example of how to use the how parameter to specify a different type of join:

main.py
# Use merge() to join the DataFrames with an outer join
df4 = pd.merge(df1, df2, how='outer')
output
	 A	  B	    D
0	 1	4.0	 10.0
1	 2	5.0	  NaN
2	 3	6.0	  NaN
3	 8	NaN	 11.0
4	 9	NaN	 12.0

Using the on parameter of merge

In addition to specifying the type of join, we can also use the on parameter of the merge() function to specify the columns to join on.

This parameter takes a column name or a list of column names, and the join will be performed using these columns.

Here is an example of how to use the on parameter to specify the columns to join on:

main.py
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 8, 9], 'B': [4, 5, 8], 'D': [10, 11, 12]})

# Use merge() to join the DataFrames on columns 'A' and 'B'
df5 = pd.merge(df1, df2, on=['A', 'B'])

print(df5)
output
   A  B   D
0  1  4  10

You'll also like

Related tutorials curated for you

    How to select multiple columns in Pandas

    How to create a bar chart in Pandas

    How to drop duplicate rows in Pandas

    How to sort a series in Pandas

    How to use ffill() in Pandas

    How to find the minimum in Pandas

    How to create a freqeuncy table in Pandas

    How to use intertuples() in Pandas

    What is .notnull in Pandas?

    How to normalize a column in Pandas

    How to get the first row in Pandas

    How to groupby, then sort within groups in Pandas