Python – Measuring Central Tendency

Python – Measuring Central Tendency

Python – Measuring Central Tendency

Mathematically central tendency means measuring the center or distribution of location of values of a data set. It gives an idea of the average value of the data in the data set and also
an indication of how widely the values are spread in the data set. That in turn helps in evaluating the chances of a new input fitting into the existing data set and hence probability
of success.
There are three main measures of central tendency which can be calculated using the methods in pandas python library.

  • Mean - It is the Average value of the data which is a division of sum of the values with the number of values.
  • Median - It is the middle value in distribution when the values are arranged in ascending or descending order.
  • Mode - It is the most commonly occurring value in a distribution.
    Mean - It is the Average value of the data which is a division of sum of the values with the number of values.
    Median - It is the middle value in distribution when the values are arranged in ascending or descending order.
    Mode - It is the most commonly occurring value in a distribution.

    Calculating Mean and Median

    The pandas functions can be directly used to calculate these values.

    import pandas as pd
    #Create a Dictionary of series
    d = {'Name':pd.Series(['Raj','Sham','Tusar','Ram','Jaggu','Karishma','Pranab',
    'Shami','Chahal','Dhoni','Sachin','Bumrah']),
    'Age':pd.Series([21,22,24,28,19,18,20,32,32,35,42,29]),
    'Rating':pd.Series([4.17,3.89,3.63,5.00,1.98,1.03,2.36,4.00,3.92,4.88,4.99,4.23])}
    #Create a DataFrame
    database = pd.DataFrame(d)
    print "Mean Values in the Distribution"
    print database.mean()
    print "*******************************"
    print "Median Values in the Distribution"
    print database.median()

    Its output is as follows −

    Mean Values in the Distribution
    Age       26.35
    Rating     3.687
    dtype: float64
    *******************************
    Median Values in the Distribution
    Age       26.58
    Rating     3.897
    dtype: float64

    Calculating Mode

    Mode may or may not be available in a distribution depending on whether the data is continous or whether there are values which has maximum frquency. We take a simple distribution below
    to find out the mode. Here we have a value which has maximum frequency in the distribution.

    import pandas as pd
    #Create a Dictionary of series
    data = {'Name':pd.Series(['Raj','Sham','Tusar','Ram','Jaggu','Karishma','Pranab',
    'Shami','Chahal','Dhoni','Sachin','Bumrah']),
    'Age':pd.Series([21,22,24,28,19,18,20,32,32,35,42,29]),
    #Create a DataFrame
    database = pd.DataFrame(data)
    print database.mode()

    Its output is as follows −

    Age      Name
    0   27.0    Raj
    1    NaN  Chahal
    2    NaN    Dhoni
    3    NaN      Sachin
    4    NaN     Pranab
    5    NaN
    6    NaN    Karishma
    7    NaN     Jaggu
    8    NaN     Ram
    9    NaN     Tusar
    10   NaN       Sham
    11   NaN       Raj
Python – Graph Data (Prev Lesson)
(Next Lesson) Python – Measuring Variance