Big Data Hadoop And Spark for Analytics (with Project Letter)

Out of Stock

Big Data Hadoop And Spark for Analytics (with Project Letter)


Please register to enroll in this course.

18% GST Extra

Starting from: 01-10-2020

If interested kindly fill the inquiry form

Out of stock

SKU: cid_94604 Category:

6 Days

Course Outline

Introduction to Big Data and Hadoop

  • What is Big Data?
  • Types of Data
  • Need for Big Data
  • Characteristics of Big Data
  • Traditional IT Analytics Approach
  • Big Data—Use Cases
  • Handling Limitations of Big Data
  • Introduction to Hadoop
  • History and Milestones of Hadoop

Getting Started with Hadoop

  • Virtual Box / VMware Player—Introduction
  • Installing Virtual Box / VMware Player
  • Setting up the Virtual Environment
  • Installation of Hadoop VM

Hadoop Architecture

  • Hadoop Cluster on commodity hardware
  • Hadoop core services and components
  • Regular file system vs. Hadoop
  • HDFS Features
  • HDFS operations


  • Introduction to MapReduce
  • Hadoop MapReduce example
  • Hadoop MapReduce Characteristics
  • Setting up your MapReduce Environment
  • Building a MapReduce Program
  • MapReduce Requirements and Features
  • Data Types
  • MapReduce Java Programming in
  • Eclipse
  • Checking Hadoop Environment for
  • MapReduce


  • What is YARN
  • Why need YARN
  • YARN Architecture


  • Background
  • Pig Architecture
  • Data Types
  • Data Loading and storage
  • Data Transformation
  • PIG : Syntax, Example and Hands On
  • Examples using pig Scripts
  • Hands-On Real time Project on Pig


  • Background
  • HIVE Architecture
  • Metastore
  • Data Types
  • Data Loading and storage
  • Data Transformation
  • HIVE : Syntax, Example and Hands
  • On Examples using Hive Scripts
  • User Defined Functions
  • Hands-On Real time Project on HIVE


  • Introduction to data ingestion tool
  • Data transfer from RDBMS into
  • Data transfer from HDFS
  • Other Operations

Introduction to Python

  • Python Programming
  • Data Types and Strings
  • Flow Constructs
  • Functions
  • List and dictionary
  • File Input and output
  • Array using Numpy
  • Plotting using MatPlotLib
  • DataFrames using Pandas
  • Data Analysis

Getting Started with Spark

  • Download Spark
  • Install Spark
  • Spark Languages
  • Using the Spark Shell

Spark Core Concepts

  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark
  • Working with RDDs
  • RDD Operations
  • Key-Value Pair RDDs
  • Pair RDD Operations
  • Load Data File into Spark
  • Save Files
  • Data Partitioning

Running Spark on a Cluster

  • A Spark Standalone Cluster
  • The Spark Standalone Web UI
  • Spark on Hadoop Cluster
  • Scheduling

Parallel Programming with Spark

  • RDD Partitions
  • HDFS Data Locality
  • Executing Parallel Operations

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Spark SQL

  • SchemaRDD
  • DataFrame and Dataset
  • SparkSession
  • SQL Operations

Spark Mlib

  • What is Machine Learning
  • Supervised Machine Learning
  • Unsupervised Machine Learning
  • Algorithms used in Machine Learning
  • Data Types in MLib
  • Building Machine Learning Applications

Advanced Spark Features

  • Spark Performance
  • Shared Variables: Broadcast Variables
  • Shared Variables: Accumulators
  • Common Performance Issues
  • Concurrency Limitation
  • Security Features
  • Memory Usage and Garbage Collection
  • Serialization

Spark and the Hadoop Ecosystem

Spark vs. MapReduce Programming

Major Projects

  1. Project 1
    • Movie Recommendation
  2. Project 2
    • Self Designed Project

Interview Questions and Quiz Discussion

Test & Evaluation

Each lecture will have a quiz containing a set of multiple-choice questions. Apart from that, there will be a final test based on multiple-choice questions.

Your evaluation will include the overall scores achieved in each lecture quiz and the final test.

  • Time-saving & Cost-effective
  • Get trained via industry experts (having 10+ years of experience in the same field, corporate trainers)
  • Full of hands-on practical exposure for better understanding
  • Adding super solid value in your professional career
  • Weekend Doubt clearing sessions.

For inquiry call:  9910043510

Online Live Training Program 2020