Big Data Hadoop And Spark for Analytics (with Internship + Project Letter)

Out of Stock

Big Data Hadoop And Spark for Analytics (with Internship + Project Letter)


Please register to enroll in this course.

18% GST Extra

Starting from: 01-10-2020

If interested kindly fill the inquiry form

Out of stock


6 Days

Course Outline

Introduction to Big Data and Hadoop

  • What is Big Data?
  • Types of Data
  • Need for Big Data
  • Characteristics of Big Data
  • Traditional IT Analytics Approach
  • Big Data—Use Cases
  • Handling Limitations of Big Data
  • Introduction to Hadoop
  • History and Milestones of Hadoop

Getting Started with Hadoop

  • Virtual Box / VMware Player—Introduction
  • Installing Virtual Box / VMware Player
  • Setting up the Virtual Environment
  • Installation of Hadoop VM

Hadoop Architecture

  • Hadoop Cluster on commodity hardware
  • Hadoop core services and components
  • Regular file system vs. Hadoop
  • HDFS Features
  • HDFS operations


  • Introduction to MapReduce
  • Hadoop MapReduce example
  • Hadoop MapReduce Characteristics
  • Setting up your MapReduce Environment
  • Building a MapReduce Program
  • MapReduce Requirements and Features
  • Data Types
  • MapReduce Java Programming in
  • Eclipse
  • Checking Hadoop Environment for
  • MapReduce


  • What is YARN
  • Why need YARN
  • YARN Architecture


  • Background
  • Pig Architecture
  • Data Types
  • Data Loading and storage
  • Data Transformation
  • PIG : Syntax, Example and Hands On
  • Examples using pig Scripts
  • Hands-On Real time Project on Pig


  • Background
  • HIVE Architecture
  • Metastore
  • Data Types
  • Data Loading and storage
  • Data Transformation
  • HIVE : Syntax, Example and Hands
  • On Examples using Hive Scripts
  • User Defined Functions
  • Hands-On Real time Project on HIVE


  • Introduction to data ingestion tool
  • Data transfer from RDBMS into
  • Data transfer from HDFS
  • Other Operations

Introduction to Python

  • Python Programming
  • Data Types and Strings
  • Flow Constructs
  • Functions
  • List and dictionary
  • File Input and output
  • Array using Numpy
  • Plotting using MatPlotLib
  • DataFrames using Pandas
  • Data Analysis

Getting Started with Spark

  • Download Spark
  • Install Spark
  • Spark Languages
  • Using the Spark Shell

Spark Core Concepts

  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark
  • Working with RDDs
  • RDD Operations
  • Key-Value Pair RDDs
  • Pair RDD Operations
  • Load Data File into Spark
  • Save Files
  • Data Partitioning

Running Spark on a Cluster

  • A Spark Standalone Cluster
  • The Spark Standalone Web UI
  • Spark on Hadoop Cluster
  • Scheduling

Parallel Programming with Spark

  • RDD Partitions
  • HDFS Data Locality
  • Executing Parallel Operations

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Spark SQL

  • SchemaRDD
  • DataFrame and Dataset
  • SparkSession
  • SQL Operations

Spark Mlib

  • What is Machine Learning
  • Supervised Machine Learning
  • Unsupervised Machine Learning
  • Algorithms used in Machine Learning
  • Data Types in MLib
  • Building Machine Learning Applications

Advanced Spark Features

  • Spark Performance
  • Shared Variables: Broadcast Variables
  • Shared Variables: Accumulators
  • Common Performance Issues
  • Concurrency Limitation
  • Security Features
  • Memory Usage and Garbage Collection
  • Serialization

Spark and the Hadoop Ecosystem

Spark vs. MapReduce Programming

Major Projects

  1. Project 1
    • Movie Recommendation
  2. Project 2
    • Self Designed Project

Interview Questions and Quiz Discussion

Test & Evaluation

Each lecture will have a quiz containing a set of multiple-choice questions. Apart from that, there will be a final test based on multiple-choice questions.

Your evaluation will include the overall scores achieved in each lecture quiz and the final test.

  • Time-saving & Cost-effective
  • Get trained via industry experts (having 10+ years of experience in the same field, corporate trainers)
  • Full of hands-on practical exposure for better understanding
  • Adding super solid value in your professional career
  • Weekend Doubt clearing sessions.

For inquiry call:  9910043510

Online Live Training Program 2020