Name: Hadoop Developer Foundation | Explore Hadoop, HDFS, Hive, Yarn & More
Brand: Trivera Technologies LLC
SKU: 1408070

Course description

Hadoop Developer Foundation | Explore Hadoop, HDFS, Hive, Yarn & More

Hadoop Developer Foundation | Working with Hadoop, HDFS, Hive, Yarn, Spark and More explores processing large data streams in the Hadoop Ecosystem. Working in a hands-on learning environment, students will learn techniques and tools for ingesting, transforming, and exporting data to and from the Hadoop Ecosystem for processing, as well as processing data using Map/Reduce, and other critical tools including Hive and Pig. Towards the end of the course, we’ll introduce other useful tools such as Spark and Oozie and discuss essential security in the ecosystem.

This “skills-centric” course is about 50% hands-on lab and 50% lecture, designed to train attendees in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices. Throughout the course students will be led through a series of progressively advanced topics, where each topic consists of lecture, group discussion, comprehensive hands-on lab exercises, and lab review.

Do you work at this company and want to update this page?

Is there out-of-date information about your company or courses published here? Fill out this form to get in touch with us.

Who should attend?

This in an intermediate-level course is geared for experienced developers seeking to be proficient in Hadoop, Spark tools & related technologies. Attendees should be experienced Python developers who are comfortable with programming languages. Students should also be able to navigate Linux command line, and who have basic knowledge of Linux editors (such as VI / nano) for editing code.

In order to gain the most from this course, attending students should be:

Familiar with basic Python programming

Comfortable in Linux environment (be able to navigate Linux command line, edit files using vi or nano)

Training content

Day One

Introduction to Hadoop

Hadoop history, concepts
Ecosystem
Distributions
High-level architecture
Hadoop myths
Hadoop challenges
Hardware and software
Lab: first look at Hadoop

HDFS

Design and architecture
Concepts (horizontal scaling, replication, data locality, rack awareness)
Daemons: Namenode, Secondary Namenode, Datanode
Communications and heart-beats
Data integrity
Read and write path
Namenode High Availability (HA), Federation
Labs: Interacting with HDFS

Day Two

YARN

YARN Concepts and architecture
Evolution from MapReduce to YARN
Labs: Running a sample YARN program

Data Ingestion

Flume for logs and other data ingestion into HDFS
Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
Copying data between clusters (distcp)
Using S3 as complementary to HDFS
Data ingestion best practices and architectures
Oozie for scheduling events on Hadoop
Labs: setting up and using Flume, the same for Sqoop

HBase

(Covered in brief)
Concepts and architecture
HBase vs RDBMS vs Cassandra
HBase Java API
Time series data on HBase
Schema design
Labs: Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise

Oozie

Introduction to Oozie
Features of Oozie
Oozie Workflow
Creating a MapReduce Workflow
Start, End, and Error Nodes
Parallel Fork and Join Nodes
Workflow Jobs Lifecycle
Workflow Notifications
Workflow Manager
Creating and Running a Workflow
Exercise: Create an Oozie Workflow from Terminal
Exercise: Create an Oozie Workflow Using Java API
Oozie Coordinator Sub-groups
Oozie Coordinator Components, Variables, and Parameters
Exercise: Create an Oozie Workflow from HUE

Day Three

Working with Hive

Architecture and design
Data types
SQL support in Hive
Creating Hive tables and querying
Partitions
Joins
Text processing
Labs: various labs on processing data with Hive

Hive (Advanced)

Transformation, Aggregation
Working with Dates, Timestamps, and Arrays
Converting Strings to Date, Time, and Numbers
Create new Attributes, Mathematical Calculations, Windowing Functions
Use Character and String Functions
Binning and Smoothing
Processing JSON Data
Execution Engines (Tez, MR, Spark)
Many labs

Day Four

Hive in Cloudera (or tools of choice)

Working with Spark

Spark Basics

Big Data, Hadoop, Spark
What’s new in Spark v2
Spark concepts and architecture
Spark ecosystem (core, spark sql, mlib, streaming)
Labs: Installing and running Spark

Spark Shell

Spark web UIs
Analyzing dataset – part 1
Labs: Spark shell exploration

RDDs (Condensed coverage)

RDDs concepts
RDD Operations / transformations
Labs : Unstructured data analytics using RDDs
Data model concepts
Partitions
Distributed processing
Failure handling
Caching and persistence
Lab on the above

Spark Dataframes & Datasets

Intro to Dataframe / Dataset
Programming in Dataframe / Dataset API
Loading structured data using Dataframes
Labs: Dataframes, Datasets, Caching

Spark SQL

Spark SQL concepts and overview
Defining tables and importing datasets
Querying data using SQL
Handling various storage formats : JSON / Parquet / ORC
Labs: querying structured data using SQL; evaluating data formats

Spark API programming (Scala and Python)

Introduction to Spark API
Submitting the first program to Spark
Debugging / logging
Configuration properties
Labs : Programming in Spark API, Submitting jobs

Spark and Hadoop

Hadoop Primer: HDFS / YARN
Hadoop + Spark architecture
Running Spark on YARN
Processing HDFS files using Spark
Spark & Hive
Lab

Capstone project

Team design workshop
The class will be broken into teams
The teams will get a name and a task
They will architect a complete solution to a specific useful problem, present it, and defend the architecture based on the best practices they have learned in class

Optional Additional Topics – Please Inquire for Details

Machine Learning (ML / MLlib)

Machine Learning primer
Machine Learning in Spark: MLlib / ML
Spark ML overview (newer Spark2 version)
Algorithms: Clustering, Classifications, Recommendations
Labs: Writing ML applications in Spark

GraphX

GraphX library overview
GraphX APIs
Labs: Processing graph data using Spark

Spark Streaming

Streaming concepts
Evaluating Streaming platforms
Spark streaming library overview
Streaming operations
Sliding window operations
Structured Streaming
Continuous streaming
Spark & Kafka streaming
Labs: Writing spark streaming applications

Costs

Price: $2,595.00
Discounted Price: $1,686.75

Why choose Trivera Technologies LLC?

Over 25 years of technology training expertise.

Robust portfolio of over 1,000 leading edge technology courses.

Guaranteed to run courses and flexible learning options.

Contact this provider

Contact course provider

Before we redirect you to this supplier's website, do you mind filling out this form so that we can stay in touch? You can unsubscribe at any time.
If you want us to recommend other suitable courses, please fill out all fields below and check the box beside "Please recommend similar options"

Country *

Please recommend similar options

I accept the: Terms and Conditions & Privacy Policy

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trivera Technologies LLC

7862 West Irlo Bronson Highway

STE 626

Kissimmee FL 34747

844.475.4559

Trivera Technologies

Trivera Technologies is a IT education services & courseware firm that offers a range of wide professional technical education services including: end to end IT training development and delivery, skills-based mentoring programs,new hire training and re-skilling services, courseware licensing and...

Ads

Hadoop Developer Foundation | Explore Hadoop, HDFS, Hive, Yarn & More

Course description

Hadoop Developer Foundation | Explore Hadoop, HDFS, Hive, Yarn & More

Do you work at this company and want to update this page?

Who should attend?

Training content

Day One

Introduction to Hadoop

HDFS

Day Two

YARN

Data Ingestion

HBase

Oozie

Day Three

Working with Hive

Hive (Advanced)

Day Four

Spark Basics

Spark Shell

RDDs (Condensed coverage)

Spark Dataframes & Datasets

Spark SQL

Spark API programming (Scala and Python)

Spark and Hadoop

Capstone project

Machine Learning (ML / MLlib)

GraphX

Spark Streaming

Costs

Why choose Trivera Technologies LLC?

Contact this provider

Contact course provider

Trivera Technologies

You may also like...

Figma for UX/UI Design

Test-Driven Development with Java

Fundamentals of Secure Application Development

Jenkins User Boot Camp (Java/Python)

Appian Associate Developer

Git & GitHub Boot Camp

JavaScript for Web Design

Effective User Acceptance Testing

Software Tester Certification Boot Camp