Professional Course

Spark for Data Science | Analyzing Big Data With Spark

3 days
3 days
This provider usually responds within 48 hours 👍

Course description

Spark for Data Science | Analyzing Big Data With Spark

Apache Spark is a powerful, open-source processing engine for data in the Hadoop cluster, optimized for speed, ease of use, and sophisticated analytics. The Spark framework supports streaming data processing and complex, iterative algorithms, enabling applications to run up to 100x faster than traditional Hadoop MapReduce programs.  With Spark, you can write sophisticated parallel applications to execute faster decisions, better decisions, and real-time actions, applied to a wide variety of use cases, architectures, and industries.

Apache Spark for Data Science is a three-day, hands-on course geared for technical business professional who wish to solve real-world data related problems using Apache Spark. This course explores using Apache Spark for common data related activities.  Students will learn to build unified big data applications combining batch, streaming, and interactive analytics on all their data.  

NOTE: The hands-on treatment and focus in this course is geared towards the data science aspects of Spark and related tools.  Students who want a more developer-oriented edition of this course should consider theTTSK7503 Spark Developer | Spark for Big Data, Hadoop & Machine Learning which aligns in subject coverage but is geared for developers instead of data scientists.

Course Objectives

This course is approximately50% hands-on, combining expert lecture, real-world demonstrations and group discussions with machine-based practical labs and exercises.  Working in a hands-on learning environment led by our expert practitioner students will explore:

  • Spark Essentials
  • DataFrames
  • Spark SQL
  • Spark MLib
  • Spark Streaming
  • Streaming with Kafka
  • Data Flow with NiFi
  • Spark GraphX
  • Performance and Tuning
  • Cluster Mode
  • Spark - the Big Picture

Trivera offers hundreds of end-to-end skills-focused courses that provide participants with the job-ready skills they require to be truly productive in a modern IT business enterprise. Our courses are available for individuals, their teams, or across their organization, for students of all skill levels and roles.  We offer an extensive online Public Course Schedule, deep catalog for Private Courses, flex-hour Mini-Camp short courses, self-paced QuickSkills courses, free webinars and more.  Trivera’s unique EveryCourse Extras and AfterCourse Extras programs, included with every course, ensure our students can put their newly-learned skills right to work, while providing them with a solid platform for continued skills-development, support and long-term growth.   For more information about our dedicated training services, public course offerings, collaborative coaching services, new hire or enterprise upskilling programs, or to see our complete list of course offerings and special offers please call us toll free at 844-475-4559. Our pricing and services are always satisfaction guaranteed.

Do you work at this company and want to update this page?

Is there out-of-date information about your company or courses published here? Fill out this form to get in touch with us.

Who should attend?

This course is an Introductory level and beyond course. Typical attendees would include systems administrators, testers or technical data related roles who need to learn to use Spark for data analysis or processing data. 

Attending students should have the following background:

  • Basic knowledge of Python Programming (or students who know R and can pick up Python easily)
  • Basic prior exposure to Java syntax (those without that background can copy and paste the labs)
  • Introduction to SQL (familiarity wits SQL basics)
  • Basic knowledge of Statistics and Probability & Data science

Training content

Getting Started

  • Our Data and our problem set
  • Accessing the cluster, the data, and the tools
  • The Continuous Workshop approach
  • "Let's build a model together"
  • Focus on analysis, exploration, data munging, algorithms
  • Tooling and fundamentals as necessary to get the job done

Spark Overview

  • Data Science: The State of the Art
  • Hadoop, Yarn, and Spark
  • Architectural Overview
  • MLib Overview
  • HDFS data - Accessing
  • Lab Focus
  • Working with HDFS data
  • Distributed vs. Local Run Modes
  • Spark vs. Other tools (when is Spark the right tool for the job?)
  • Spark vs. SAS
  • Spark Languages (Java, R, Python, and Scala)
  • Hello, Spark

Spark Essentials

  • Spark Core
  • Spark SQL
  • Spark and Hive
  • Lab
  • MLib
  • Spark Streaming
  • Spark API


  • DataFrames and Resilient Distributed Datasets (RDDs)
  • Partitions
  • Adding variables to a DataFrame
  • DataFrame Types
  • DataFrame Operations
  • Dependent vs. Independent variables
  • Map/Reduce with DataFrames

Spark SQL

  • Spark SQL Overview
  • Data stores: HDFS, Cassandra, HBase, Hive, and S3
  • Table Definitions
  • Queries

Spark MLib

  • MLib overview
  • MLib Algorithms Overview
  • Classification Algorithms
  • Regression Algorithms
  • Lab Focus
  • Brief Comparison to SAS
  • Here's your split, how to tune regression
  • Decision Trees and forests
  • Lab Focus
  • Brief Comparison to SAS
  • Stepwise approach to Decision Trees
  • Working with Exit Criteria
  • Recommendation with ALS
  • Clustering Algorithms
  • Lab Focus
  • Key Clustering Algorithms
  • Choosing Clustering Algorithms
  • Working with key algorithms
  • Machine Learning Pipelines
  • Linear Algebra (SVD, PCA)
  • Statistics in MLib

Spark Streaming

  • Streaming overview
  • Real-time data ingestion
  • State
  • Window Operations

Streaming with Kafka

  • Kafka overview
  • Kafka and Spark Streaming

Data Flow with NiFi

  • Apache NiFi overview
  • NiFi data flows with Spark/R

Spark GraphX

  • GraphX overview
  • ETL with GraphX
  • Graph computation

Performance and Tuning

  • Broadcast variables
  • Accumulators
  • Memory Management

Cluster Mode

  • Standalone Cluster
  • Masters and Workers
  • Configurations
  • Working with large data sets

Spark - the Big Picture

  • Spark in Real-Time and near-Real-Time Decision Support Systems
  • Spark in the Enterprise
  • Best Practices

Course delivery details

Our course materials include more than a simple slideshow presentation handout. Each student will receive a comprehensive course Student Guide, complete with detailed course notes, code samples, software tutorials, diagrams and related reference materials and links. Our courses also include detailed our Student Workbook, with step by step hands-on lab instructions and project files (as necessary) and solutions, clearly illustrated for users to complete hands-on work in class, and to revisit to review or refresh skills at any time.  Students will also receive the course set up files, project files(or code, if applicable) and solutions required for the hands-on work.


  • Price: $2,195.00
  • Discounted Price: $1,426.75

Why choose Trivera Technologies LLC?

Over 25 years of technology training expertise.

Robust portfolio of over 1,000 leading edge technology courses.

Guaranteed to run courses and flexible learning options.

Contact this provider

Contact course provider

Before we redirect you to this supplier's website, do you mind filling out this form so that we can stay in touch? You can unsubscribe at any time.
If you want us to recommend other suitable courses, please fill out all fields below and check the box beside "Please recommend similar options"
Country *

reCAPTCHA logo This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Trivera Technologies LLC
7862 West Irlo Bronson Highway
STE 626
Kissimmee FL 34747

Trivera Technologies

Trivera Technologies is a IT education services & courseware firm that offers a range of wide professional technical education services including: end to end IT training development and delivery, skills-based mentoring programs,new hire training and re-skilling services, courseware licensing and...

Read more and show all training delivered by this supplier