Professional Course

Hadoop Developer Foundation | Explore Hadoop, HDFS, Hive, Yarn & More

Length
4 days
Length
4 days
This provider usually responds within 48 hours 👍

Course description

Hadoop Developer Foundation | Explore Hadoop, HDFS, Hive, Yarn & More

Hadoop Developer Foundation | Working with Hadoop, HDFS, Hive, Yarn, Spark and More explores processing large data streams in the Hadoop Ecosystem. Working in a hands-on learning environment, students will learn techniques and tools for ingesting, transforming, and exporting data to and from the Hadoop Ecosystem for processing, as well as processing data using Map/Reduce, and other critical tools including Hive and Pig. Towards the end of the course, we’ll introduce other useful tools such as Spark and Oozie and discuss essential security in the ecosystem.

This “skills-centric” course is about 50% hands-on lab and 50% lecture, designed to train attendees in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices. Throughout the course students will be led through a series of progressively advanced topics, where each topic consists of lecture, group discussion, comprehensive hands-on lab exercises, and lab review.

Do you work at this company and want to update this page?

Is there out-of-date information about your company or courses published here? Fill out this form to get in touch with us.

Who should attend?

This in an intermediate-level course is geared for experienced developers seeking to be proficient in Hadoop, Spark tools & related technologies. Attendees should be experienced Python developers who are comfortable with programming languages. Students should also be able to navigate Linux command line, and who have basic knowledge of Linux editors (such as VI / nano) for editing code.

In order to gain the most from this course, attending students should be:

Familiar with basic Python programming

Comfortable in Linux environment (be able to navigate Linux command line, edit files using vi or nano)

Training content

Day One

Introduction to Hadoop

  • Hadoop history, concepts
  • Ecosystem
  • Distributions
  • High-level architecture
  • Hadoop myths
  • Hadoop challenges
  • Hardware and software
  • Lab: first look at Hadoop

HDFS

  • Design and architecture
  • Concepts (horizontal scaling, replication, data locality, rack awareness)
  • Daemons: Namenode, Secondary Namenode, Datanode
  • Communications and heart-beats
  • Data integrity
  • Read and write path
  • Namenode High Availability (HA), Federation
  • Labs: Interacting with HDFS

Day Two

YARN

  • YARN Concepts and architecture
  • Evolution from MapReduce to YARN
  • Labs: Running a sample YARN program

Data Ingestion

  • Flume for logs and other data ingestion into HDFS
  • Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
  • Copying data between clusters (distcp)
  • Using S3 as complementary to HDFS
  • Data ingestion best practices and architectures
  • Oozie for scheduling events on Hadoop
  • Labs: setting up and using Flume, the same for Sqoop

HBase

  • (Covered in brief)
  • Concepts and architecture
  • HBase vs RDBMS vs Cassandra
  • HBase Java API
  • Time series data on HBase
  • Schema design
  • Labs: Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise

Oozie

  • Introduction to Oozie
  • Features of Oozie
  • Oozie Workflow
  • Creating a MapReduce Workflow
  • Start, End, and Error Nodes
  • Parallel Fork and Join Nodes
  • Workflow Jobs Lifecycle
  • Workflow Notifications
  • Workflow Manager
  • Creating and Running a Workflow
  • Exercise: Create an Oozie Workflow from Terminal
  • Exercise: Create an Oozie Workflow Using Java API
  • Oozie Coordinator Sub-groups
  • Oozie Coordinator Components, Variables, and Parameters
  • Exercise: Create an Oozie Workflow from HUE

Day Three

Working with Hive

  • Architecture and design
  • Data types
  • SQL support in Hive
  • Creating Hive tables and querying
  • Partitions
  • Joins
  • Text processing
  • Labs: various labs on processing data with Hive

Hive (Advanced)

  • Transformation, Aggregation
  • Working with Dates, Timestamps, and Arrays
  • Converting Strings to Date, Time, and Numbers
  • Create new Attributes, Mathematical Calculations, Windowing Functions
  • Use Character and String Functions
  • Binning and Smoothing
  • Processing JSON Data
  • Execution Engines (Tez, MR, Spark)
  • Many labs

Day Four

Hive in Cloudera (or tools of choice)

Working with Spark

Spark Basics

  • Big Data, Hadoop, Spark
  • What’s new in Spark v2
  • Spark concepts and architecture
  • Spark ecosystem (core, spark sql, mlib, streaming)
  • Labs: Installing and running Spark

Spark Shell

  • Spark web UIs
  • Analyzing dataset – part 1
  • Labs: Spark shell exploration

RDDs (Condensed coverage)

  • RDDs concepts
  • RDD Operations / transformations
  • Labs : Unstructured data analytics using RDDs
  • Data model concepts
  • Partitions
  • Distributed processing
  • Failure handling
  • Caching and persistence
  • Lab on the above

Spark Dataframes & Datasets

  • Intro to Dataframe / Dataset
  • Programming in Dataframe / Dataset API
  • Loading structured data using Dataframes
  • Labs: Dataframes, Datasets, Caching

Spark SQL

  • Spark SQL concepts and overview
  • Defining tables and importing datasets
  • Querying data using SQL
  • Handling various storage formats : JSON / Parquet / ORC
  • Labs: querying structured data using SQL; evaluating data formats

Spark API programming (Scala and Python)

  • Introduction to Spark API
  • Submitting the first program to Spark
  • Debugging / logging
  • Configuration properties
  • Labs : Programming in Spark API, Submitting jobs

Spark and Hadoop

  • Hadoop Primer: HDFS / YARN
  • Hadoop + Spark architecture
  • Running Spark on YARN
  • Processing HDFS files using Spark
  • Spark & Hive
  • Lab

Capstone project

  • Team design workshop
  • The class will be broken into teams
  • The teams will get a name and a task
  • They will architect a complete solution to a specific useful problem, present it, and defend the architecture based on the best practices they have learned in class

Optional Additional Topics – Please Inquire for Details

Machine Learning (ML / MLlib)

  • Machine Learning primer
  • Machine Learning in Spark: MLlib / ML
  • Spark ML overview (newer Spark2 version)
  • Algorithms: Clustering, Classifications, Recommendations
  • Labs: Writing ML applications in Spark

GraphX

  • GraphX library overview
  • GraphX APIs
  • Labs: Processing graph data using Spark

Spark Streaming

  • Streaming concepts
  • Evaluating Streaming platforms
  • Spark streaming library overview
  • Streaming operations
  • Sliding window operations
  • Structured Streaming
  • Continuous streaming
  • Spark & Kafka streaming
  • Labs: Writing spark streaming applications

Costs

  • Price: $2,595.00
  • Discounted Price: $1,686.75

Why choose Trivera Technologies LLC?

Over 25 years of technology training expertise.

Robust portfolio of over 1,000 leading edge technology courses.

Guaranteed to run courses and flexible learning options.

Contact this provider

Contact course provider

Before we redirect you to this supplier's website, do you mind filling out this form so that we can stay in touch? You can unsubscribe at any time.
If you want us to recommend other suitable courses, please fill out all fields below and check the box beside "Please recommend similar options"
Country *

reCAPTCHA logo This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Trivera Technologies LLC
7862 West Irlo Bronson Highway
STE 626
Kissimmee FL 34747

Trivera Technologies

Trivera Technologies is a IT education services & courseware firm that offers a range of wide professional technical education services including: end to end IT training development and delivery, skills-based mentoring programs,new hire training and re-skilling services, courseware licensing and...

Read more and show all training delivered by this supplier

Ads