Join two DataFrames on a column

dataframe_merge_by_column

Skill - Join two DataFrames on a column

Table of Contents

Skills Required

Please make sure to have all the skills mentioned above to understand and execute the code mentioned below. Go through the above skills if necessary for reference or revision


Pandas is a python library.
DataFrame is a data structure provided by the pandas library.

Please go through Pandas DataFrame Basics to learn the basics of pandas DataFrame.

merge dataframes

In this post, we will learn how to join two DataFrames using merge function

In order to merge a DataFrame with another either it’s index or column can be used


Merge on index

# import pandas module
import pandas as pd

# create a dataframe
df1 = pd.DataFrame({'cleft': ['foo', 'bar', 'baz', 'foo'],'value': [1, 2, 3, 5]})
print(df1)
"""
prints
   cleft  value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
"""

# create another dataframe
df2 = pd.DataFrame({'cright': ['foo', 'bar', 'baz', 'foo'],'value': [5, 6, 7, 8]})
print(df2)
"""
prints
    cright  value
0    foo      5
1    bar      6
2    baz      7
3    foo      8
"""

# merge dataframes on index
df3 = df1.merge(df2, left_index=True, right_index=True)
print(df3)
"""
this prints
   cleft  value_x cright  value_y
0   foo        1    foo        5
1   bar        2    bar        6
2   baz        3    baz        7
3   foo        5    foo        8
"""

# merge dataframes on index and use custom suffix for overlapping columns
df3 = df1.merge(df2, left_index=True, right_index=True, suffixes=('_lft', '_rgt'))
print(df3)
"""
this prints
    cleft  value_lft cright  value_rgt
0   foo          1    foo          5
1   bar          2    bar          6
2   baz          3    baz          7
3   foo          5    foo          8
"""

We can see that merge function adds suffixes (which are also configurable) to overlapping columns in the output DataFrame

Merge on column names or a common column name

# import pandas module
import pandas as pd

# create a dataframe
df1 = pd.DataFrame({'cleft': ['foo', 'bar', 'baz', 'foo'],'value': [1, 2, 3, 5]})

# create another dataframe
df2 = pd.DataFrame({'cright': ['foo', 'bar', 'baz', 'foo'],'value': [5, 6, 7, 8]})

# merge dataframes on column names
df3 = df1.merge(df2, left_on='cleft', right_on='cright')
print(df3)
"""
this prints
    cleft  value_x cright  value_y
0   foo        1    foo        5
1   foo        1    foo        8
2   foo        5    foo        5
3   foo        5    foo        8
4   bar        2    bar        6
5   baz        3    baz        7
"""

# merge on a column name common to both dataframes
df3 = df1.merge(df2, on='value')
print(df3)
"""
this prints
  cleft value cright
0 foo 5 foo
"""

In the above example, only one row is present, since only one row has came ‘value’ column in both DataFrames.
This is called an inner join in which only rows with common values of the merging columns are present in the output DataFrame

Type of join in merge function using ‘how’ input

  • If how = 'inner', then only rows with same values in the joining columns of both DataFrames are considered in output.
  • If how = 'outer', then all the rows will be considered in the output
  • If how = 'left', then all the rows of the left DataFrame will be considered in the output. If the values in right DataFrame will be NaN if corresponding column value is not present
  • If how = 'right', then all the rows of the right DataFrame will be considered in the output. If the values in left DataFrame will be NaN if corresponding column value is not present
# import pandas
import pandas as pd

# create a dataframe
df1 = pd.DataFrame({'cleft': ['foo', 'bar', 'baz', 'foo'],'value': [1, 2, 3, 5]})

# create another dataframe
df2 = pd.DataFrame({'cright': ['foo', 'bar', 'baz', 'foo'],'value': [5, 6, 7, 8]})

# merge on a column name with join type as 'inner'
df3 = df1.merge(df2, on='value')
print(df3)
"""
this prints
  cleft value cright
0 foo 5 foo
"""

# merge on a column name with join type as 'outer'
df3 = df1.merge(df2, on='value', how='outer')
print(df3)
"""
this prints
  cleft  value cright
0   foo      1    NaN
1   bar      2    NaN
2   baz      3    NaN
3   foo      5    foo
4   NaN      6    bar
5   NaN      7    baz
6   NaN      8    foo
"""

# merge on a column name with join type as 'outer'
df3 = df1.merge(df2, on='value', how='left')
print(df3)
"""
this prints
  cleft  value cright
0   foo      1    NaN
1   bar      2    NaN
2   baz      3    NaN
3   foo      5    foo
"""

# merge on a column name with join type as 'outer'
df3 = df1.merge(df2, on='value', how='right')
print(df3)
"""
this prints
  cleft  value cright
0   foo      5    foo
1   NaN      6    bar
2   NaN      7    baz
3   NaN      8    foo
"""

Please read this official documentation for getting know about more options and examples


Video

Video for this post can be found here

Online Interpreter

Although we recommend to practice the above examples in Visual Studio Code, you can run these examples online at https://www.tutorialspoint.com/execute_python_online.php

References


Table of Contents

Comments