Python has several methods are available to perform aggregations on data. It is done using the pandas and numpy libraries. The data must be available or converted to
a dataframe to apply the aggregation functions.
Applying Aggregations on DataFrame
Let us create a DataFrame and apply aggregations on it.
import pandas as pd
import numpy as np
database = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('1/1/2000', periods=10),
columns = ['A', 'B', 'C', 'D'])
print database
r = database.rolling(window=3,min_periods=1)
print r
Its output is as follows −
A B C D
2020-01-01 1.946546 -0.032165 -2.566516 -0.606506
2020-01-02 0.065644 -0.698798 -0.065460 0.654066
2020-01-03 -0.890656 -0.065168 0.964065 -2.650465
2020-01-04 1.897906 1.984065 -0.984066 1.987684
2020-01-05 0.987065 -0.065654 0.897650 -0.890646
2020-01-06 0.897406 0.031664 -1.984650 0.121640
2020-01-07 0.984650 -0.987106 -1.987065 0.804650
2020-01-08 0.984006 -1.894065 0.984065 -1.065064
2020-01-09 1.980656 -0.056497 0.650652 -0.894056
2020-01-10 0.260569 1.984065 0.206054 -1.894560
Rolling [window=3,min_periods=1,center=False,axis=0]
We can aggregate by passing a function to the entire DataFrame, or select a column via the standard get item method.
Apply Aggregation on a Whole Dataframe
import pandas as pd
import numpy as np
database = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('1/1/2000', periods=10),
columns = ['A', 'B', 'C', 'D'])
print database
r = df.rolling(window=3,min_periods=1)
print r.aggregate(np.sum)
Its output is as follows −
A B C D
2020-01-01 1.946546 -0.032165 -2.566516 -0.606506
2020-01-02 0.065644 -0.698798 -0.065460 0.654066
2020-01-03 -0.890656 -0.065168 0.964065 -2.650465
2020-01-04 1.897906 1.984065 -0.984066 1.987684
2020-01-05 0.987065 -0.065654 0.897650 -0.890646
2020-01-06 0.897406 0.031664 -1.984650 0.121640
2020-01-07 0.984650 -0.987106 -1.987065 0.804650
2020-01-08 0.984006 -1.894065 0.984065 -1.065064
2020-01-09 1.980656 -0.056497 0.650652 -0.894056
2020-01-10 0.260569 1.984065 0.206054 -1.894560
A B C D
2020-01-01 1.456454 -0.542134 -2.46545 -0.456345
2020-01-02 1.55131 -1.546531 -3.546165 -0.065654
2020-01-03 1.87046 -2.032065 -3.065646 -2.003354
2020-01-04 1.45645 -0.456464 -0.564611 -0.894564
2020-01-05 1.65461 0.486064 0.846566 -1.98465
2020-01-06 1.89406 1.984656 -0.065649 1.798564
2020-01-07 0.03654 0.987546 -2.064897 0.031657
2020-01-08 0.013564 -0.03168 -2.031697 -0.001345
2020-01-09 2.642106 -1.03165 -0.654064 -1.031654
2020-01-10 2.876564 0.564060 1.65490 -3.654123
Apply Aggregation on a Single Column of a Dataframe
import pandas as pd
import numpy as np
database = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('1/1/2000', periods=10),
columns = ['A', 'B', 'C', 'D'])
print database
r = database.rolling(window=3,min_periods=1)
print r['A'].aggregate(np.sum)
Its output is as follows −
A B C D
2020-01-01 1.946546 -0.032165 -2.566516 -0.606506
2020-01-02 0.065644 -0.698798 -0.065460 0.654066
2020-01-03 -0.890656 -0.065168 0.964065 -2.650465
2020-01-04 1.897906 1.984065 -0.984066 1.987684
2020-01-05 0.987065 -0.065654 0.897650 -0.890646
2020-01-06 0.897406 0.031664 -1.984650 0.121640
2020-01-07 0.984650 -0.987106 -1.987065 0.804650
2020-01-08 0.984006 -1.894065 0.984065 -1.065064
2020-01-09 1.980656 -0.056497 0.650652 -0.894056
2020-01-10 0.260569 1.984065 0.206054 -1.894560
2020-01-01 1.946546
2020-01-02 1.032165
2020-01-03 1.566516
2020-01-04 1.606506
2020-01-05 1.456456
2020-01-06 1.789787
2020-01-07 0.424568
2020-01-08 0.756456
2020-01-09 2.344566
2020-01-10 2.123456
Freq: D, Name: A, dtype: float64
Apply Aggregation on Multiple Columns of a DataFrame
import pandas as pd
import numpy as np
database = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('1/1/2000', periods=10),
columns = ['A', 'B', 'C', 'D'])
print database
r = database.rolling(window=3,min_periods=1)
print r[['A','B']].aggregate(np.sum)
Its output is as follows −
A B C D
2020-01-01 1.946546 -0.032165 -2.566516 -0.606506
2020-01-02 0.065644 -0.698798 -0.065460 0.654066
2020-01-03 -0.890656 -0.065168 0.964065 -2.650465
2020-01-04 1.897906 1.984065 -0.984066 1.987684
2020-01-05 0.987065 -0.065654 0.897650 -0.890646
2020-01-06 0.897406 0.031664 -1.984650 0.121640
2020-01-07 0.984650 -0.987106 -1.987065 0.804650
2020-01-08 0.984006 -1.894065 0.984065 -1.065064
2020-01-09 1.980656 -0.056497 0.650652 -0.894056
2020-01-10 0.260569 1.984065 0.206054 -1.894560
A B C D
2020-01-01 1.456454 -0.542134 -2.46545 -0.456345
2020-01-02 1.55131 -1.546531 -3.546165 -0.065654
2020-01-03 1.87046 -2.032065 -3.065646 -2.003354
2020-01-04 1.45645 -0.456464 -0.564611 -0.894564
2020-01-05 1.65461 0.486064 0.846566 -1.98465
2020-01-06 1.89406 1.984656 -0.065649 1.798564
2020-01-07 0.03654 0.987546 -2.064897 0.031657
2020-01-08 0.013564 -0.03168 -2.031697 -0.001345
2020-01-09 2.642106 -1.03165 -0.654064 -1.031654
2020-01-10 2.876564 0.564060 1.65490 -3.654123