Big Data Hadoop And Spark For Developers (with Internship + Project Letter)

Out of Stock

Big Data Hadoop And Spark For Developers (with Internship + Project Letter)


Please register to enroll in this course.

18% GST Extra

Starting from: 01-10-2020

If interested kindly fill the inquiry form

Out of stock


30 Hours (2 hours per Session, 15 Sessions)

Course Outline

Introduction to Big Data and Hadoop

  • What is Big Data?
  • Types of Data
  • Need for Big Data
  • Characteristics of Big Data
  • Traditional IT Analytics Approach
  • Big Data—Use Cases
  • Handling Limitations of Big Data
  • Introduction to Hadoop
  • History and Milestones of Hadoop

Getting Started with Hadoop

  • Virtual Box / VMware Player—Introduction
  • Installing Virtual Box / VMware Player
  • Setting up the Virtual Environment
  • Installation of Cloudera VM

Hadoop Architecture

  • Hadoop Cluster on commodity hardware
  • Hadoop core services and components
  • Regular file system vs. Hadoop
  • HDFS layer
  • HDFS Operations


  • Introduction to MapReduce
  • Hadoop MapReduce example
  • Hadoop MapReduce Characteristics
  • Setting up your MapReduce Environment
  • Building a MapReduce Program
  • MapReduce Requirements and Features
  • Data Types
  • MapReduce Java Programming in Eclipse
  • Checking Hadoop Environment for MapReduce


  • What is YARN
  • Why need YARN
  • YARN Architecture


  • Background
  • Pig Architecture
  • Data Types
  • Data Loading and storage
  • Data Transformation
  • PIG: Syntax, Example and Hands-On Examples using pig Scripts
  • Hands-On Real-time Project on Pig


  • Background
  • HIVE Architecture
  • Metastore
  • Data Types
  • Data Loading and storage
  • Data Transformation
  • HIVE: Syntax, Example and Hands-On Examples using Hive Scripts
  • User-Defined Functions
  • Hands-On Real-time Project on HIVE


  • Introduction to data ingestion tool
  • Data transfer from RDBMS
  • Data transfer from HDFS

Getting Started with Spark

  • Download Spark
  • Install Spark
  • Spark Languages
  • Using the pyspark

Spark Core Concepts

  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark
  • Working with RDDs
  • RDD Operations
  • Key-Value Pair RDDs
  • Pair RDD Operations
  • Load Data File into Spark
  • Save Files
  • Data Partitioning

Running Spark on a Cluster

  • A Spark Standalone Cluster
  • The Spark Standalone Web UI
  • Spark on Hadoop Cluster
  • Spark on Cloud
  • Scheduling

Parallel Programming with Spark

  • RDD Partitions
  • HDFS Data Locality
  • Executing Parallel Operations

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Spark SQL

  • SchemaRDD
  • DataFrame and Dataset
  • SparkSession
  • SQL Operations

Common Performance Issues

  • Concurrency Limitation
  • Security Features
  • Memory Usage and Garbage Collection
  • Serialization
Test & Evaluation

Each lecture will have a quiz containing a set of multiple-choice questions. Apart from that, there will be a final test based on multiple-choice questions.

Your evaluation will include the overall scores achieved in each lecture quiz and the final test.

  • Time-saving & Cost-effective
  • Get trained via industry experts (having 10+ years of experience in the same field, corporate trainers)
  • Full of hands-on practical exposure for better understanding
  • Adding super solid value in your professional career
  • Weekend Doubt clearing sessions.

For inquiry call:  9910043510

Online Live Training Program 2020