3.2. Pandas DataFrame diff() Method#

Pandas diff() is used to find the difference of a DataFrame element compared with another element in the DataFrame (default is element in the same column of the previous row).

Syntax: DataFrame.diff(periods=1, axis=0)

Parameters:

  1. periods: int, default 1. Periods to shift for calculating difference, accepts negative values.

  2. axis: {0 or ‘index’, 1 or ‘columns’}, default 0. Take difference over rows (0) or columns (1).

  3. Returns: DataFrame or Series of the same size and shape as the input.

# Example: diff
import pandas as pd

# Create a dataframe
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6],
                     'B': [1, 1, 2, 3, 5, 8],
                        'C': [1, 4, 9, 16, 25, 36]})
print(df)
   A  B   C
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36

Calculate the difference between the current and a shifted row.

The default is to shift by one row

step by step:
1. current row: 2, 1, 4
2. shifted row: 1, 1, 1
3. difference: 1, 0, 3
# Calculate the difference between the current and a shifted row

print(df.diff())
     A    B     C
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0

Calculate the difference between the current and a shifted row of period 2.

period 2 means the difference between the current row and the row 2 rows before it.

step by step:
1. current row: 6, 8, 36
2. shifted row: 4, 3, 16
3. difference: 2, 5, 20
# Calculate the difference between the current and a shifted row of period 2

df.diff(periods=2)
A B C
0 NaN NaN NaN
1 NaN NaN NaN
2 2.0 1.0 8.0
3 2.0 2.0 12.0
4 2.0 3.0 16.0
5 2.0 5.0 20.0

Calculate the difference between the current and a shifted row of axis 1.

axis=1 means column wise operation and axis=0 means row wise operation.

step by step for axis 1 and period 2
1. current column: 1, 4, 9, 16, 25, 36
2. shifted column: 1, 2, 3, 4, 5, 6
3. difference: 0, 2, 6, 12, 20, 30
# Calculate the difference between the current and a shifted row of axis 1 and period 2

df.diff(periods=2, axis=1)
A B C
0 NaN NaN 0
1 NaN NaN 2
2 NaN NaN 6
3 NaN NaN 12
4 NaN NaN 20
5 NaN NaN 30