In order to use data to capitalize on new opportunities and address challenges, companies must be able to manage large data sets efficiently and effectively. If your organization has chosen Spark to manage data sets, then Big Data with Spark will get your data and analytics workforce up to speed.
Attendees will learn about Spark, a framework for distributed computing, as well as learn libraries for manipulating data and doing machine learning on data stored in Spark clusters.
We offer in-person training, as well as remote training via our Live Online technology. We are able to blend these capabilities so we can teach your entire team, even if they’re not all in one place.
Upon completion of the course, attendees should be able to:
Identify challenges when working with large datasets
List elements of the Hadoop and Spark big data ecosystems
Identify the advantages of using Spark for large datasets
Identify and describe the different Spark data structures
Use Spark to collect, aggregate, analyze, and model data
Describe Window and user-defined functions
Explain how Spark performs different types of joins
Describe common Spark issues and optimizations
Identify and describe the different Spark elements for performing machine learning
Data at scale
Introduction to big data ecosystem
Hadoop, MapReduce, HDFS
DataFrames and Spark SQL
Review day 1
User-defined functions (UDFs)
Common Spark issues and optimizations
Related Blog Posts