Mathematically central tendency means measuring the center or distribution of location of values of a data set. It gives an idea of the average value of the data in the data set and also
an indication of how widely the values are spread in the data set. That in turn helps in evaluating the chances of a new input fitting into the existing data set and hence probability
of success.
There are three main measures of central tendency which can be calculated using the methods in pandas python library.
- Mean - It is the Average value of the data which is a division of sum of the values with the number of values.
- Median - It is the middle value in distribution when the values are arranged in ascending or descending order.
- Mode - It is the most commonly occurring value in a distribution.
Mean - It is the Average value of the data which is a division of sum of the values with the number of values.
Median - It is the middle value in distribution when the values are arranged in ascending or descending order.
Mode - It is the most commonly occurring value in a distribution.Calculating Mean and Median
The pandas functions can be directly used to calculate these values.
import pandas as pd #Create a Dictionary of series d = {'Name':pd.Series(['Raj','Sham','Tusar','Ram','Jaggu','Karishma','Pranab', 'Shami','Chahal','Dhoni','Sachin','Bumrah']), 'Age':pd.Series([21,22,24,28,19,18,20,32,32,35,42,29]), 'Rating':pd.Series([4.17,3.89,3.63,5.00,1.98,1.03,2.36,4.00,3.92,4.88,4.99,4.23])} #Create a DataFrame database = pd.DataFrame(d) print "Mean Values in the Distribution" print database.mean() print "*******************************" print "Median Values in the Distribution" print database.median()
Its output is as follows −
Mean Values in the Distribution Age 26.35 Rating 3.687 dtype: float64 ******************************* Median Values in the Distribution Age 26.58 Rating 3.897 dtype: float64
Calculating Mode
Mode may or may not be available in a distribution depending on whether the data is continous or whether there are values which has maximum frquency. We take a simple distribution below
to find out the mode. Here we have a value which has maximum frequency in the distribution.import pandas as pd #Create a Dictionary of series data = {'Name':pd.Series(['Raj','Sham','Tusar','Ram','Jaggu','Karishma','Pranab', 'Shami','Chahal','Dhoni','Sachin','Bumrah']), 'Age':pd.Series([21,22,24,28,19,18,20,32,32,35,42,29]), #Create a DataFrame database = pd.DataFrame(data) print database.mode()
Its output is as follows −
Age Name 0 27.0 Raj 1 NaN Chahal 2 NaN Dhoni 3 NaN Sachin 4 NaN Pranab 5 NaN 6 NaN Karishma 7 NaN Jaggu 8 NaN Ram 9 NaN Tusar 10 NaN Sham 11 NaN Raj