Every Tuesday in April: Free Intro to Python Training Series for Business Professionals Register Now

Big Data

with Spark

Details

PREREQUISITES

Experience with Python, some experience with Machine Learning, and some experience with SQL (optional)

LENGTH

2 Days

LOCATION

On-site or Live Online

STUDENT PROFILE

Highly technical analysts or Data Scientists hoping to gain familiarity with working with data at scale

Course Description

In order to use data to capitalize on new opportunities and address challenges, companies must be able to manage large data sets efficiently and effectively. If your organization has chosen Spark to manage data sets, then Big Data with Spark will get your data and analytics workforce up to speed.

Attendees will learn about Spark, a framework for distributed computing, as well as learn libraries for manipulating data and doing machine learning on data stored in Spark clusters.

We offer in-person training, as well as remote training via our Live Online technology. We are able to blend these capabilities so we can teach your entire team, even if they’re not all in one place.

Course Outcomes

Upon completion of the course, attendees should be able to:

Identify challenges when working with large datasets

List elements of the Hadoop and Spark big data ecosystems

Identify the advantages of using Spark for large datasets

Identify and describe the different Spark data structures

Use Spark to collect, aggregate, analyze, and model data

Describe Window and user-defined functions

Explain how Spark performs different types of joins

Describe common Spark issues and optimizations

Identify and describe the different Spark elements for performing machine learning

Training Content

DAY 1:

Data at scale

Introduction to big data ecosystem

Hadoop, MapReduce, HDFS

Spark overview

DataFrames and Spark SQL

Basic functions 

DAY 2:

Review day 1

User-defined functions (UDFs)

Window functions

Joins overview

Common Spark issues and optimizations

Machine Learning

Details

PREREQUISITES

Experience with Python, some experience with Machine Learning, and some experience with SQL (optional)

LENGTH

2 Days

LOCATION

On-site or Live Online

STUDENT PROFILE

Highly technical analysts or Data Scientists hoping to gain familiarity with working with data at scale