Spark For Developers

Spark For Developers

Rs.8,475.00

Please register to enroll in this course.

18% GST Extra

Starting from: 01-10-2020

If interested kindly fill the inquiry form

SKU: cid_94989 Category:
Duration

3 days

Test & Evaluation

Each lecture will have a quiz containing a set of multiple-choice questions. Apart from that, there will be a final test based on multiple-choice questions.

Your evaluation will include the overall scores achieved in each lecture quiz and the final test.

Course Outline

Getting Started with Spark

  • Download Spark
  • Install Spark
  • Spark Languages
  • Using the Spark Shell

Introduction to Scala

  • Functional Programming
  • Object-Oriented Programming
  • Features of Scala
  • Programming with Scala
  • classes, case classes and Traits

Spark Core Concepts

  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark
  • Working with RDDs
  • RDD Operations
  • Key-Value Pair RDDs
  • Pair RDD Operations
  • Load Data File into Spark
  • Save Files
  • Data Partitioning

Running Spark on a Cluster

  • A Spark Standalone Cluster
  • The Spark Standalone Web UI
  • Spark on Hadoop Cluster
  • Scheduling

Parallel Programming with Spark

  • RDD Partitions
  • HDFS Data Locality
  • Executing Parallel Operations

Writing Spark Applications

  • Building Spark Application using SBT
  • Building Spark Application using Maven
  • IDE setup
  • Spark Applications vs. Spark Shell
  • Creating the SparkContext
  • Configuring Spark Properties
  • Building and Running a Spark Application
  • Deploying Application on Cluster
  • Logging

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Spark SQL

  • SchemaRDD
  • DataFrame and Dataset
  • SparkSession
  • SQL Operations

Spark Streaming

  • Spark Streaming Overview
  • Example: Streaming Word Count
  • Other Streaming Operations
  • Sliding Window Operations
  • Developing Spark Streaming Applications

Advanced Spark Features

  • Spark Performance
  • Shared Variables: Broadcast Variables
  • Shared Variables: Accumulators

Common Performance Issues

  • Concurrency Limitation
  • Security Features
  • Memory Usage and Garbage Collection
  • Serialization
Benefits:
  • Time-saving & Cost-effective
  • Get trained via industry experts (having 10+ years of experience in the same field, corporate trainers)
  • Full of hands-on practical exposure for better understanding
  • Adding super solid value in your professional career
  • Weekend Doubt clearing sessions.

For inquiry call:  9910043510

Online Live Training Program 2020

Open chat