Skip to main content

concat()

The concat() function in Pandas is a powerful tool for combining data from multiple DataFrames. It offers flexible options for handling datasets of different sizes, columns, or indices. This documentation will walk you through its functionality, use cases, and practical examples.


Introduction

In real-world scenarios, data often comes from various sources and may not have consistent structures. You might encounter datasets with:

  1. Different indices.
  2. Varying column names.
  3. Overlapping or completely distinct data points.

The concat() function allows seamless merging of these datasets into a single DataFrame, addressing scenarios where information may or may not overlap.


Syntax

pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, ...) -> DataFrame

Parameters:

  • objs: List or tuple of DataFrames or Series to concatenate.
  • axis: 0 (default) for vertical concatenation, 1 for horizontal concatenation.
  • join: Specifies how to handle indexes and columns:
    • "outer" (default): Union of all columns or indices.
    • "inner": Intersection of columns or indices.
  • ignore_index: Boolean. Resets the index to default integer indexing.
  • keys: List of keys to create a multi-level index.

Examples

1. Basic Concatenation

Combine two DataFrames vertically (default behavior):

import pandas as pd

# Define DataFrames
data1 = {"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}
data2 = {"A": [10, 11, 12], "B": [13, 14, 15], "C": [16, 17, 18]}

# Convert to DataFrames
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Concatenate
result = pd.concat([df1, df2])
print(result)

Output:

    A   B   C
0 1 4 7
1 2 5 8
2 3 6 9
0 10 13 16
1 11 14 17
2 12 15 18

Notice that indices are repeated.


2. Horizontal Concatenation

Concatenate DataFrames along the horizontal axis by setting axis=1:

result = pd.concat([df1, df2], axis=1)
print(result)

Output:

    A  B  C   A   B   C
0 1 4 7 10 13 16
1 2 5 8 11 14 17
2 3 6 9 12 15 18

3. Adding Keys for Multi-Level Indexing

Use the keys parameter to distinguish between datasets:

result = pd.concat([df1, df2], keys=['data1', 'data2'])
print(result)

Output:

          A   B   C
data1 0 1 4 7
1 2 5 8
2 3 6 9
data2 0 10 13 16
1 11 14 17
2 12 15 18

4. Ignoring Index

Reset the index using ignore_index=True:

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

    A   B   C
0 1 4 7
1 2 5 8
2 3 6 9
3 10 13 16
4 11 14 17
5 12 15 18

5. Different Indices

Handle DataFrames with differing indices:

# Modify indices
df1.index = [1, 2, 3]
df2.index = [4, 5, 6]

result = pd.concat([df1, df2])
print(result)

Output:

    A   B   C
1 1 4 7
2 2 5 8
3 3 6 9
4 10 13 16
5 11 14 17
6 12 15 18

6. Different Column Names

Concatenate DataFrames with mismatched columns:

# Modify columns
data2 = {"D": [10, 11, 12], "E": [13, 14, 15], "F": [16, 17, 18]}
df2 = pd.DataFrame(data2)

result = pd.concat([df1, df2], sort=False)
print(result)

Output:

     A    B    C     D     E     F
0 1.0 4.0 7.0 NaN NaN NaN
1 2.0 5.0 8.0 NaN NaN NaN
2 3.0 6.0 9.0 NaN NaN NaN
0 NaN NaN NaN 10.0 13.0 16.0
1 NaN NaN NaN 11.0 14.0 17.0
2 NaN NaN NaN 12.0 15.0 18.0

7. Join Options

Outer Join (default):

Includes all columns:

result = pd.concat([df1, df2], join='outer')
print(result)

Inner Join:

Keeps only overlapping columns:

result = pd.concat([df1, df2], join='inner')
print(result)

Conclusion

The concat() function in Pandas is essential for combining datasets in flexible and efficient ways. By understanding its parameters and behaviors, you can handle various real-world data integration tasks with ease.