By Hemanta Sundaray on 2022-05-09
We can concatenate two or more DataFrames (or Series) together, both vertically and horizontally using the concat function.
Let’s read two different Excel worksheets into two different DataFrames:
Import pandas as pd
Import matplotlib.pyplot as plt
%matplotlib notebook
data1 = pd.read_excel("dataOne.xlsx")
data2 = pd.read_excel("dataTwo.xlsx")
data1
data2
Next, we will make the SITE CODE column serve as the index of both of our DataFrames, using the set_index() method.
data1.set_index(keys="SITE CODE", inplace=True)
data1
data2.set_index(keys="SITE CODE", inplace=True)
data2
Let’s place both the DataFrames into a single list, and then call the concat function to concatenate them together:
pd.concat([data1, data2])
The first argument is the only argument required for the concat function and it must be a sequence of pandas objects, typically a list or dictionary of DataFrames or Series.
By default, the concat function concatenates DataFrames vertically, one on top of the other.
When we concatenate vertically, the DataFrames align by their column names. Notice that all the column names that were the same in both the DataFramess lined up precisely under the same column name.
It is also possible to concatenate horizontally by changing the axis parameter to columns or 1:
pd.concat([data1, data2], axis='columns')
The concat function, by default, uses an outer join, keeping all rows from each DataFrame in the list. However, it gives us options to only keep rows that have the same index values in both DataFrames. This is referred to as an inner join. We set the join parameter to inner to change the behavior:
pd.concat([data1, data2], join='inner', axis='columns')