Python – Pandas

Python – Pandas

Python – Pandas

Pandas is an open-source Python Library used for high-performance data manipulation and data analysis using its powerful data structures. Python with pandas is in use in a variety of academic and commercial domains, including Finance, Economics, Statistics, Advertising, Web Analytics, and more.
Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, organize, manipulate, model, and analyse the data.
Below are the some of the important features of Pandas which is used specifically for Data processing and Data analysis work.

Key Features of Pandas

  • Fast and efficient DataFrame object with default and customized indexing.
  • Tools for loading data into in-memory data objects from different file formats.
  • Data alignment and integrated handling of missing data.
  • Reshaping and pivoting of date sets.
  • Label-based slicing, indexing and subsetting of large data sets.
  • Columns from a data structure can be deleted or inserted.
  • Group by data for aggregation and transformations.
  • High performance merging and joining of data.
  • Time Series functionality.
    Pandas deals with the following three data structures −
  • Series
  • DataFrame
    These data structures are built on top of Numpy array, making them fast and efficient.

    Dimension & Description

    The best way to think of these data structures is that the higher dimensional data structure is a container of its lower dimensional data structure. For example, DataFrame is a container of Series, Panel is a container of DataFrame.

    Data Structure Dimensions Description
    Data Frames 2 General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.
    Series 1 1D labeled homogeneous array, size-immutable.

    DataFrame is widely used and it is the most important data structures.
    ## Series
    Series is a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, …

    10 23 56 98 62 45 98 59 64 46

    ### Key Points of Series

  • Homogeneous data
  • Size Immutable
  • Values of Data Mutable

    DataFrame

    DataFrame is a two-dimensional array with heterogeneous data. For example,

    Name Age Gender Rating
    Raju 22 Male 2.89
    Bheem 25 Male 4.9
    Jagdeesh 21 Male 3.0
    Krishna 30 Male 5.00

    The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person.
    ## Data Type of Columns
    The data types of the four columns are as follows −

    Column Type
    Name

    String
    Age

    Integer
    Gender

    String
    Rating

    Float

    ### Key Points of Data Frame

  • Heterogeneous data
  • Size Mutable
  • Data Mutable
    We will see lots of examples on using pandas library of python in Data science work in the next chapters.
Python – Data Science Environment Setup (Prev Lesson)
(Next Lesson) Python – Numpy