Big Data Hadoop And Spark For Developers (with Project Letter)

Out of Stock

Big Data Hadoop And Spark For Developers (with Project Letter)


Please register to enroll in this course.

18% GST Extra

Starting from: 01-10-2020

If interested kindly fill the inquiry form

Out of stock

SKU: cid_95050 Category:

30 Hours (2 hours per Session, 15 Sessions)

Course Outline

Introduction to Big Data and Hadoop

  • What is Big Data?
  • Types of Data
  • Need for Big Data
  • Characteristics of Big Data
  • Traditional IT Analytics Approach
  • Big Data—Use Cases
  • Handling Limitations of Big Data
  • Introduction to Hadoop
  • History and Milestones of Hadoop

Getting Started with Hadoop

  • Virtual Box / VMware Player—Introduction
  • Installing Virtual Box / VMware Player
  • Setting up the Virtual Environment
  • Installation of Cloudera VM

Hadoop Architecture

  • Hadoop Cluster on commodity hardware
  • Hadoop core services and components
  • Regular file system vs. Hadoop
  • HDFS layer
  • HDFS Operations


  • Introduction to MapReduce
  • Hadoop MapReduce example
  • Hadoop MapReduce Characteristics
  • Setting up your MapReduce Environment
  • Building a MapReduce Program
  • MapReduce Requirements and Features
  • Data Types
  • MapReduce Java Programming in Eclipse
  • Checking Hadoop Environment for MapReduce


  • What is YARN
  • Why need YARN
  • YARN Architecture


  • Background
  • Pig Architecture
  • Data Types
  • Data Loading and storage
  • Data Transformation
  • PIG: Syntax, Example and Hands-On Examples using pig Scripts
  • Hands-On Real-time Project on Pig


  • Background
  • HIVE Architecture
  • Metastore
  • Data Types
  • Data Loading and storage
  • Data Transformation
  • HIVE: Syntax, Example and Hands-On Examples using Hive Scripts
  • User-Defined Functions
  • Hands-On Real-time Project on HIVE


  • Introduction to data ingestion tool
  • Data transfer from RDBMS
  • Data transfer from HDFS

Getting Started with Spark

  • Download Spark
  • Install Spark
  • Spark Languages
  • Using the pyspark

Spark Core Concepts

  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark
  • Working with RDDs
  • RDD Operations
  • Key-Value Pair RDDs
  • Pair RDD Operations
  • Load Data File into Spark
  • Save Files
  • Data Partitioning

Running Spark on a Cluster

  • A Spark Standalone Cluster
  • The Spark Standalone Web UI
  • Spark on Hadoop Cluster
  • Spark on Cloud
  • Scheduling

Parallel Programming with Spark

  • RDD Partitions
  • HDFS Data Locality
  • Executing Parallel Operations

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Spark SQL

  • SchemaRDD
  • DataFrame and Dataset
  • SparkSession
  • SQL Operations

Common Performance Issues

  • Concurrency Limitation
  • Security Features
  • Memory Usage and Garbage Collection
  • Serialization
Test & Evaluation

Each lecture will have a quiz containing a set of multiple-choice questions. Apart from that, there will be a final test based on multiple-choice questions.

Your evaluation will include the overall scores achieved in each lecture quiz and the final test.

  • Time-saving & Cost-effective
  • Get trained via industry experts (having 10+ years of experience in the same field, corporate trainers)
  • Full of hands-on practical exposure for better understanding
  • Adding super solid value in your professional career
  • Weekend Doubt clearing sessions.

For inquiry call:  9910043510

Online Live Training Program 2020