Spark Developemnt

August 16, 2018

Course Outline
Introduction to Apache Hadoop and the Hadoop Ecosystem

Introduction to Apache Hadoop and the Hadoop Ecosystem
Apache Hadoop Overview
Data Ingestion and Storage
Data Processing
Data Analysis and Exploration
Other Ecosystem Tools
Introduction to the Hands-On Exercises
Apache Hadoop File Storage

Apache Hadoop Cluster Components
HDFS Architecture
Using HDFS
Distributed Processing on an Apache Hadoop Cluster

YARN Architecture
Working With YARN
Apache Spark Basics

What is Apache Spark?
Starting the Spark Shell
Using the Spark Shell
Getting Started with Datasets and DataFrames
DataFrame Operations
Working with DataFrames and Schemas

Creating DataFrames from Data Sources
Saving DataFrames to Data Sources
DataFrame Schemas
Eager and Lazy Execution
Analyzing Data with DataFrame Queries

Querying DataFrames Using Column Expressions
Grouping and Aggregation Queries
Joining DataFrames
RDD Overview

RDD Overview
RDD Data Sources
Creating and Saving RDDs
RDD Operations
Transforming Data with RDDs

Writing and Passing Transformation Functions
Transformation Execution
Converting Between RDDs and DataFrames
Aggregating Data with Pair RDDs

Key-Value Pair RDDs
Map-Reduce
Other Pair RDD Operations

Querying Tables and Views with Apache Spark SQL

Querying Tables in Spark Using SQL
Querying Files and Views
The Catalog API
Comparing Spark SQL, Apache Impala, and Apache Hive-on-Spark

Working with Datasets in Scala

Datasets and DataFrames
Creating Datasets
Loading and Saving Datasets
Dataset Operations

Writing, Configuring, and Running Apache Spark Applications

Writing a Spark Application
Building and Running an Application
Application Deployment Mode
The Spark Application Web UI
Configuring Application Properties
Distributed Processing

Review: Apache Spark on a Cluster
RDD Partitions
Example: Partitioning in Queries
Stages and Tasks
Job Execution Planning
Example: Catalyst Execution Plan
Example: RDD Execution Plan
Distributed Data Persistence

DataFrame and Dataset Persistence
Persistence Storage Levels
Viewing Persisted RDDs
Common Patterns in Apache Spark Data Processing

Common Apache Spark Use Cases
Iterative Algorithms in Apache Spark
Machine Learning
Example: k-means
Apache Spark Streaming: Introduction to DStreams

Apache Spark Streaming Overview
Example: Streaming Request Count
DStreams
Developing Streaming Applications
Apache Spark Streaming: Processing Multiple Batches

Multi-Batch Operations
Time Slicing
State Operations
Sliding Window Operations
Preview: Structured Streaming

Apache Spark Streaming: Data Sources

Streaming Data Source Overview
Apache Flume and Apache Kafka Data Sources
Example: Using a Kafka Direct Data Source

Search This Blog

Cloud platform

Spark Developemnt

Comments

Post a Comment

Popular posts from this blog

Python Training in Hyderabad

python classes

FULL STACK Python & Node JS Developer