Ends May 17: 30% Introductory Savings on Online Flex Bootcamps! Explore Programs

Data Engineering

for Data Scientists

Details

PREREQUISITES

Knowledge of Python programming, machine learning, and data science fundamentals (including basic predictive modeling, database use and querying)

LENGTH

20 hours (split into 5 or more sessions)

LOCATION

Live Online

STUDENT PROFILE

Early stage and experienced data/applied scientists

Course Description

Help your organization improve efficiency and lower costs when bringing data science solutions to production. From programming best practices to data engineering approaches, this course equips your team members with the skills to operationalize projects and collaborate effectively across teams. This course is designed to help your team build consistent, organized data science solutions that support impactful strategic analysis and decision-making at every level of your business.

Course Outcomes

Upon completion of the course, attendees should be able to:

Estimate complexity of algorithms and programming alternatives

Understand best practices for implementing solutions in Python, including modules, efficient data structures, version control, virtual environments, and testing

Leverage shell scripting, command-line approaches, and cloud-based solutions

Work effectively with various database implementations, including advanced querying and scripting

Scale solutions to work with large datasets

Build programming interfaces and interact with external APIs

Training Content

MODULE 1:

Programming Style

Workflow Basics

  • Bash fundamentals, command line arguments, editors
  • Working with virtual environments 
  • Collaboration and version control

Algorithms and Data Structures

  • Measuring complexity and runtime performance
  • Using efficient data structures
  • Serialization and parallel programming

Programming Concepts

  • Modules and reusable solutions 
  • Object-oriented programming 
  • Code organization and style
  • Testing and  debugging, writing  testable code

MODULE 2:

Big Data and Databases

Working with databases

  • Review of SQL, SQL flavors
  • Creating and updating tables
  • NoSQL alternatives

Queries

  • Review basic queries and joins
  • Advanced queries and subqueries
  • Aggregate functions

Scaling Approaches for Big Data

  • Hadoop, MapReduce, HDFS
  • Using Spark: Spark SQL, Spark Streaming, ML/MLlib, GraphX

MODULE 3:

Applications and Interfaces

Shell Programming and Scripting, Job Control

Working with APIs

  • Common APIs and cloud connections
  • Creating API with Flask
  • Interacting with JSON data
  • Working with REST APIs
  • Accessing APIs via requests

Details

PREREQUISITES

Knowledge of Python programming, machine learning, and data science fundamentals (including basic predictive modeling, database use and querying)

LENGTH

20 hours (split into 5 or more sessions)

LOCATION

Live Online

STUDENT PROFILE

Early stage and experienced data/applied scientists