Professional Course

JumpStart to Spark Programming for Developers

3 days
3 days
This provider usually responds within 48 hours 👍

Course description

JumpStart to Spark Programming for Developers

Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, it offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R - the favorite languages of Data Scientists - along with SQL-based front ends.  With advanced libraries like Mahout and MLib for Machine Learning, GraphX or Neo4J for rich data graph processing as well as access to other NOSQL data stores, Rule engines and other Enterprise components, Spark is a lynchpin in modern Big Data and Data Science computing. 

Geared for experienced developers, Spark Developer | Spark for Big Data, Hadoop & Machine Learning provides students with a comprehensive, hands-on exploration of enterprise-grade Spark programming, interacting with the significant components mentioned above to craft complete data science solutions.  Students will leave this course armed with the skills they require to begin working with Spark in a practical, real world environment.  

This course is offered in support of the Python programming language but can also be offered for R or Java with advance notice and planning. Our team will work with you to coordinate the languages, tools and environment that will work best for your organization and needs. Please inquire for details. 

Learning Objectives 

This “skills-centric” course is about 50% hands-on lab and 50% lecture, designed to train attendees in core big data/ Spark development and use skills, coupling the most current, effective techniques with the soundest industry practices. Throughout the course students will be led through a series of progressively advanced topics, where each topic consists of lecture, group discussion, comprehensive hands-on lab exercises, and lab review.  

This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development focused on Spark and related tools.  Working in a hands-on learning environment, students will explore: 

  • Spark Ecosystem  
  • Spark Shell  
  • Spark Data structures (RDD, DataFrame, Dataset)  
  • Spark SQL  
  • Modern data formats and Spark  
  • Spark API  
  • Spark & Hadoop & Hive  
  • Spark ML overview  
  • GraphX  
  • Time-permitting: Spark Streaming  
  • Time-permitting: Optional Capstone Workshop (Time-Permitting)  

Trivera offers hundreds of end-to-end skills-focused courses that provide participants with the job-ready skills they require to be truly productive in a modern IT business enterprise. Our courses are available for individuals, their teams, or across their organization, for students of all skill levels and roles.  We offer an extensive online Public Course Schedule, deep catalog for Private Courses, flex-hour Mini-Camp short courses, self-paced QuickSkills courses, free webinars and more.  Trivera’s unique EveryCourse Extras and AfterCourse Extras programs, included with every course, ensure our students can put their newly-learned skills right to work, while providing them with a solid platform for continued skills-development, support and long-term growth.   For more information about our dedicated training services, public course offerings, collaborative coaching services, new hire or enterprise upskilling programs, or to see our complete list of course offerings and special offers please call us toll free at 844-475-4559. Our pricing and services are always satisfaction guaranteed.

Do you work at this company and want to update this page?

Is there out-of-date information about your company or courses published here? Fill out this form to get in touch with us.

Who should attend?

This foundation-level course is geared for intermediate skilled, experienced Developers and Architects (with basic Python experience) who seek to be proficient in advanced, modern development skills working with Apache Spark in an enterprise data environment. 

Take Before: Students should have attended the course(s) below, or should have basic skills in these areas:  

  • TTPS4800  Introduction to Python Programming 
  • TTSQLB3 Introduction to SQL (Basic familiarity is needed, not in-depth SQL skills)  

Related Courses 

  • TTSK7502 Apache Spark Primer | Hands-on Spark Essentials, Components, RDDs & More (2 days) 
  • TTSK7503 Spark Developer for Big Data, Hadoop & Machine Learning (3 days) 
  • TTSK7515 Spark for Big Data | Enterprise-Grade Spark Programming for the Hadoop & Big Data Ecosystem (5 days) 
  • TTSK7517 Big Data Developer with Spark & Cassandra (5 days) 

Learning Paths: This course is a core component of our Big Data, AI & Machine Learning Skills Path, designed to train participants of all skill levels in modern AI, Machine Learning and Big Data skills across the enterprise. We offer courses in next level Spark, Hadoop, AI and Machine Learning, Deep Learning, Natural Language Processing, Applied Machine Learning (Chatbots, Intelligent Web) and many more related titles. Please contact us for details and next step recommendations based on your specific roles and. goals. 

Training content

Spark Introduction 

  • Big data, Hadoop, Spark 
  • Spark concepts and architecture 
  • Spark components overview 
  • Labs: installing and running Spark 

The first look at Spark 

  • Spark shell 
  • Spark web UIs 
  • Analyzing dataset – part 1 
  • Labs: Spark shell exploration 

Spark Data structures 

  • Partitions 
  • Distributed execution 
  • Operations: transformations and actions 
  • Labs: Unstructured data analytics using RDDs 


  • Caching overview 
  • Various caching mechanisms available in Spark 
  • In memory file systems 
  • Caching use cases and best practices 
  • Labs: Benchmark of caching performance 

DataFrames and Datasets 

  • DataFrames Intro 
  • Loading structured data (JSON, CSV) using DataFrames 
  • Using schema 
  • Specifying schema for DataFrames 
  • Labs: DataFrames, Datasets, Schema 

Spark SQL 

  • Spark SQL concepts and overview 
  • Defining tables and importing datasets 
  • Querying data using SQL 
  • Handling various storage formats: JSON, Parquet, ORC 
  • Labs: querying structured data using SQL; evaluating data formats 

Spark and Hadoop 

  • Hadoop Primer: HDFS, YARN 
  • Hadoop + Spark architecture 
  • Running Spark on Hadoop YARN 
  • Processing HDFS files using Spark 
  • Spark & Hive 

Spark API 

  • Overview of Spark APIs in Scala / Python 
  • The lifecycle of a Spark application 
  • Spark APIs 
  • Deploying Spark applications on YARN 
  • Labs: Developing and deploying a Spark application 

Spark ML Overview 

  • Machine Learning primer 
  • Machine Learning in Spark: MLib / ML 
  • Spark ML overview (newer Spark2 version) 
  • Algorithms overview: Clustering, Classifications, Recommendations 
  • Labs: Writing ML applications in Spark 


  • GraphX library overview 
  • GraphX APIs 
  • Create a Graph and navigating it 
  • Shortest distance 
  • Pregel API 
  • Labs: Processing graph data using Spark 

Time Permitting Topics 

Spark Streaming 

  • Streaming concepts 
  • Evaluating Streaming platforms 
  • Spark streaming library overview 
  • Streaming operations 
  • Sliding window operations 
  • Structured Streaming 
  • Continuous streaming 
  • Spark & Kafka streaming 
  • Labs: Writing spark streaming applications 


  • Attendees will work on solving real-world data analysis problems using Spark 

Course delivery details

Student Materials: Each student will receive a Student Guide with course notes, code samples, setp-by-step written lab instructions, software tutorials, diagrams and related reference materials and links (as applicable). Students will also receive related (as applicable) project files, code files, data sets and solutions required for any hands-on work. 

Lab Setup Made Simple.   All course labs and solutions, data sets, software, detailed courseware, lab guides and resources (as applicable) are provided for attendees in our easy access, no installation required, remote lab environment. Our tech team will help set up, test and verify lab access for each attendee prior to the course start date, ensuring a smooth start to class and successful hands-on course experience for all participants.   


  • Price: $2,195.00
  • Discounted Price: $1,426.75

Why choose Trivera Technologies LLC?

Over 25 years of technology training expertise.

Robust portfolio of over 1,000 leading edge technology courses.

Guaranteed to run courses and flexible learning options.

Contact this provider

Contact course provider

Before we redirect you to this supplier's website, do you mind filling out this form so that we can stay in touch? You can unsubscribe at any time.
If you want us to recommend other suitable courses, please fill out all fields below and check the box beside "Please recommend similar options"
Country *

reCAPTCHA logo This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Trivera Technologies LLC
7862 West Irlo Bronson Highway
STE 626
Kissimmee FL 34747

Trivera Technologies

Trivera Technologies is a IT education services & courseware firm that offers a range of wide professional technical education services including: end to end IT training development and delivery, skills-based mentoring programs,new hire training and re-skilling services, courseware licensing and...

Read more and show all training delivered by this supplier