Concatenating Multiple DataFrames Together Using concat Function

By Hemanta Sundaray on 2022-05-09

We can concatenate two or more DataFrames (or Series) together, both vertically and horizontally using the concat function.

Let’s read two different Excel worksheets into two different DataFrames:

Import pandas as pd

Import matplotlib.pyplot as plt

%matplotlib notebook

data1 = pd.read_excel("dataOne.xlsx")

data2 = pd.read_excel("dataTwo.xlsx")
data1

Data

data2

Data

Next, we will make the SITE CODE column serve as the index of both of our DataFrames, using the set_index() method.

data1.set_index(keys="SITE CODE", inplace=True)

data1

Data

data2.set_index(keys="SITE CODE", inplace=True)

data2

Data

Let’s place both the DataFrames into a single list, and then call the concat function to concatenate them together:

pd.concat([data1, data2])

The first argument is the only argument required for the concat function and it must be a sequence of pandas objects, typically a list or dictionary of DataFrames or Series.

Data

By default, the concat function concatenates DataFrames vertically, one on top of the other.

When we concatenate vertically, the DataFrames align by their column names. Notice that all the column names that were the same in both the DataFramess lined up precisely under the same column name.

It is also possible to concatenate horizontally by changing the axis parameter to columns or 1:

pd.concat([data1, data2], axis='columns')

Data

The concat function, by default, uses an outer join, keeping all rows from each DataFrame in the list. However, it gives us options to only keep rows that have the same index values in both DataFrames. This is referred to as an inner join. We set the join parameter to inner to change the behavior:

pd.concat([data1, data2], join='inner', axis='columns')

Data

Join the Newsletter