CORPORATE DATA SCIENCE

TRAINING

Your employees know your use cases, infrastructure, and stack. We provide training to fill in the gaps, enabling your organization to capitalize on the potential already working under your roof.

Though we’re also proponents of hiring full-time data scientists, we understand that the challenge of finding, recruiting, and retaining data science talent can be a major undertaking.

Many businesses find they can save time and money by “training up” their existing talent base rather than recruiting new employees. We can help.

Selectively building the skills of your current employees – on-site with live instruction – can be a faster and more efficient path toward solving immediate business challenges. We offer data science corporate training programs that leverage the expertise of our Senior Data Scientists to bring practical and applicable skills to your teams.

  • "This is the best training course that I have ever taken in my 15 years at [my company]. By far. I have already recommended to 3 of my colleagues."

  • "I thoroughly enjoyed getting insights into how the techniques we learned could be applied for a variety of scenarios. It was great to have such knowledgeable, credible folks teaching this."

  • “We worked extensively on manipulating data in Python. I got awesome practice and experience working with Pandas data frames and carried away "data engineering" knowledge I could have only gained individually after hours of poring over code and documentation. I've taken these skills right back to my work.”

  • “The instructors provided expert training in a fun learning environment.”

  • “Not overly prescriptive - instructors gave us the freedom to try out our own hypothesis and modeling ideas. However, there was enough structure to keep us pointed in the right direction.”

Women at computer
Overview

Students are given an introduction to programmatic thinking using Python, which includes syntax, conditional statements, data structures, loops, and list comprehensions.

Purpose

Guide and teach individuals to dive into the content through exercises and challenges.

Prerequisites

Ideally some coding experience (but not required).

Note:
Students should have Python and Anaconda installed in advance.

Overview

Students learn about the unique aspects of Apache Spark including RDDs, interactive shells, spark processes, etc. Students gain hands-on experience working with Spark’s interactive shell, parallelization functionality, SparkSQL, Spark Streaming, and more.

Purpose

Spark has gained incredible momentum in the last few years because it’s the largest open source project in data processing. In this course, students gain a fundamental understanding of the Spark paradigm, as well as hands experience implementing the various Spark modules.

Prerequisites

Metis Introduction to Python course (or basic Python training)

Note:
This class can be extended to 2 days if Spark MLlib (Machine Learning) instruction is requested. Students are also trained on Spark’s MLlib module implementation and optimization.

Prerequisites for 2-Day Option:
Metis Intro to Python course (or basic Python training)
Metis Mastering Supervised Learning course (or basic understanding of machine learning)

Overview

Learn the basics of front-end design including HTML, CSS, and JavaScript. Learn how to get data from different public APIs including Twitter, Instagram, and Zillow.

Purpose

Web-based data is not always readily available in document form. By learning to work with APIs and BeautifulSoup, students are able to extract data directly from websites, and by first learning web fundamentals, they gain an understanding of the inherent structure of HTML – and in turn, the structure of the web. Students are then able to parse and extract specific information from a web page.

APIs allow us to sharply focus on information on specific websites. However, not all sites have dedicated public APIs; BeautifulSoup and Selenium allow us to parse and extract data from virtually any public site.

Prerequisites

Basic Python knowledge (or Metis Intro to Python Module)

Overview

Gain understanding of the mathematical theory behind Ordinary Least Squares (OLS) Linear Regression. Within the scope of OLS, we’ll study the following: how to evaluate models, predictive feature treatment, feature selection, regularization, model optimization methods, as well as the underlying assumptions of the OLS model.

Students also learn how to apply regression modeling directly with Python packages including scikit-learn and statsmodels.

Purpose

Upon understanding the theory and application methods of linear regression, students are able to perform predictions and forecasting, as well as gain understanding into the relative strength of the linear regression model's predictive features.

Prerequisites

Metis Exploratory Data Analysis course (or basic Python – including Pandas, and basic stats)

Overview

Students learn how to work directly with NoSQL (Not Only SQL) databases via the MongoDB database, as well as how to work with large, unstructured datasets. They learn how to implement commands for manipulating and querying data, and launch their own Amazon Web Services (AWS) Elastic Compute Cloud (EC2) (i.e., virtual computer on the cloud).

Purpose

By learning how to launch their own EC2 instance, students gain hands-on experience installing and managing their own virtual computer. (Note: We also provide student access to free storage (8GB) for one year).

Students also install their own MongoDB database on the cloud and have the opportunity to learn NoSQL syntax, schemas, and queries while working with large datasets.

Prerequisites

Ideally some coding experience (but not required)

Overview

Students learn the intricacies of the Flask microframework including templates, REST interface, and database connection.

Purpose

Building on their knowledge of Python and HTML, students learn how to create interactive web applications via the Python/Flask framework.

Prerequisites

Metis Intro to Python course (or basic Python training)

Experience writing HTML, CSS, and basic JavaScript

Overview

Students learn the structure of relational databases and how to develop SQL schemas via the PostgreSQL database. They also learn how to manage data with SQL and will implement complex commands for manipulating and querying data.

Additionally, students learn how to launch their own Amazon Web Services (AWS) Elastic Compute Cloud (EC2) (i.e., virtual computer on the cloud).

Purpose

By learning how to launch their own EC2 instance, students gain hands-on experience installing and managing their own virtual computer. (Note: We provide student access to free storage (8GB) for one year). Students install their own PostgreSQL database on the cloud and have the opportunity to learn SQL syntax, schemas, and complex queries while working with large datasets.

Prerequisites

Ideally some coding experience (but not required).

Overview

Gain experience processing large data sets. We discuss the MapReduce parallelization, the HDFS data storage system, how to query large datasets via Hive, and more. Student also learn how to launch their own Amazon Web Services (AWS) Elastic Compute Cloud (EC2) (i.e., virtual computer on the cloud). They then learn how to set up a pseudo-distributed, single node Hadoop cluster utilizing their EC2 instance.

Purpose

Students get a Hadoop installation up and running so they can work with big datasets directly via the Hadoop and Hive software. They gain insight into the Hadoop MapReduce paradigm, as well as learn best practices for working with massive datasets.

Prerequisites

Metis Intro to Python course (or basic Python training)

Some knowledge of SQL (but not required)

Overview

Review probability and statistics including basic probability theory, conditional probability, distributions, and hypothesis testing. Students develop a “Data Science Toolkit” by directly working with a number of Python packages including IPython Notebook, Pandas, and matplotlib. (Optional add-on: Git/Github, Unix.)

Purpose

Using the Toolkit, students explore a chosen dataset with Python and Pandas, gaining an understanding of the data's main characteristics. Matplotlib enables them to visualize the data and explore the underlying assumptions of the statistical models and the data's distributions.

Prerequisites

Metis Intro to Python course (or basic Python training)

Overview

Learn the basics of how HTML, CSS, and JavaScript contribute to front-end design. Students learn how to work with D3, a JavaScript library that allow development of dynamic data-driven visualizations. We focus on the unique functionality of D3, including the binding of data to images with data joins.

Purpose

By learning how to work with D3, students are able to enhance their storytelling with data by developing beautiful, interactive, and dynamic data visualizations.

Prerequisites

Experience writing HTML, CSS, and basic JavaScript

Overview

This module is a combination of the following modules: Big Data Processing with Spark and Hadoop/Hive on the Cloud modules.

Purpose

We discuss best practices for working with massive datasets, as well as discuss which platforms to leverage when. Students get a Hadoop installation up and running and start working with big datasets directly via the Hadoop and Hive software, all the while gaining insight into the Hadoop MapReduce paradigm. Additionally, students gain a fundamental understanding of the Spark paradigm and get hands-on experience implementing various Spark modules.

Prerequisites

Metis Intro to Python course (or basic Python training)

Knowledge of SQL (but not required)

Note: This class can be extended to 4 days if Spark MLlib (Machine Learning) instruction is requested. In that case, students are also trained on Spark’s MLlib module implementation and optimization.


Prerequisites for 4 day option:

Metis Intro to Python course (or basic Python training)

Knowledge of SQL (but not required)

Metis Mastering Supervised Learning course (or basic understanding of machine learning)

Overview

Learn the mathematical theory and optimization behind Classification & Regression Algorithms including Logistic Regression, K Nearest-Neighbors, Support Vector Machines, Decision Trees, Random Forest, Naïve Bayes, and Neural Networks. Learn best practices for implementing these algorithms in Python's scikit-learn package. Students also learn how to optimize models by utilizing hyperparameter selection and feature selection in an effort to minimize classification error.

Purpose

Upon developing a classification model, students can predict previously unknown categorical class labels for datasets. As just one example of what’s possible, a classifier can be built to predict categorical labels such as “safe” or “risky” for loan application data.

Prerequisites

Metis Exploratory Data Analysis course (or basic Python – including Pandas, and basic stats)

Overview

Learn the theory, mathematics, and optimization behind Clustering Algorithms including K-­Means, Hierarchical Agglomerative Clustering, DBSCAN, and more. Develop best practices for implementing these algorithms, including how to use scikit-learn, discerning which algorithms and tuning parameters to try, and determining how to refine the model. This module also covers Dimension Reduction and Principal Component Analysis.

Purpose

Unsupervised Learning is often used to draw inferences from unlabeled data. Cluster analysis is built on the idea that objects of the same group will naturally “cluster” together. For example, perhaps we’d like to determine unique genres for songs given specific musical properties such as tempo, danceability, and mode. Because there are numerous clustering methods available, we also discuss when and where to use certain algorithms.

When dealing with large datasets with numerous features, we’re often faced with “the curse of dimensionality.” We’ll go over how this adversely affects the modeling and will discuss how to combat the curse using dimensionality reduction techniques such as PCA.

Prerequisites

Metis Exploratory Data Analysis course (or basic Python – including Pandas, and basic stats)

Overview

Get an overview of the different aspects of Natural Language Processing (NLP) and the Natural Language Toolkit (NLTK). We go over word counts, n-­grams, TD-­IDF, sentiment analysis, and more. We discuss the idea of topic modeling and Latent Dirichlet allocation (LDA), along with Word2Vec and the Gensim Toolkit.

Purpose

Text data has incredible inherent value. For example, by performing text analysis on Amazon user reviews for a certain product, you can gain insight into how the sentiment around that product may be changing over time.

Students learn how to transform text data into quantitative data so that both supervised and unsupervised learning techniques can be applied. By utilizing NLP techniques such as Word2Vec and LDA, students learn about direct relationships between words and can identify main topics within the data.

Prerequisites

Metis Exploratory Data Analysis course (or equivalent experience)

Overview

Review probability and statistics including basic probability theory, conditional probability, distributions, and hypothesis testing. Students develop a “Data Science Toolkit” by directly working with a number of Python packages including IPython Notebook, Pandas, and matplotlib. (Optional add-on: Git/Github, Unix.)

Discuss and illustrate the process of prototyping, designing, analyzing, and testing our data science process using examples throughout.

Purpose

Students answer questions regarding their datasets by utilizing iterative design and exploratory data analysis skills. While students may start out asking one question, through the exploratory and iterative process, they could uncover additional questions, want to introduce additional data sources, or rethink previous assumptions.

Prerequisites

Metis Intro to Python course (or equivalent)

Overview

This module is comprised of the following mid-sized modules: Getting Data from the Web (“GDW”) and Mastering Unsupervised Learning.

Purpose

By combining web scraping skills with machine learning skills, students get exposed to the entire process of data science modeling, which includes data acquisition and cleaning, modeling the data, and developing predictions using machine learning methods.

Prerequisites

Metis Exploratory Data Analysis course (or basic Python – including Pandas, and basic stats)

Overview

This module is comprised of the following mid-size modules: NoSQL On the Cloud, Mastering Unsupervised Learning, NLP and Topic Modeling, and Getting Data from the Web.

Students learn how to develop and build interactive visualizations of text clustering applications via an interactive web app. They learn how to capture data from the web by utilizing APIs and learn how to add their unstructured data into the MongoDB database.

Students also learn how to build their own app – a database-driven CRUD (Create, Read, Update, and Delete) app while utilizing Python, Flask, and PostgreSQL. Students then combine the CRUD technology with their machine learning and D3 visualization skills to develop their own predictive, interactive web app.

Purpose

Knowing how to cluster algorithms with text data has many practical applications including customer segmentation, document organization, classification, and visualization.

Prerequisites

Metis Intro to Python course (or basic Python training)

Should also have experience writing HTML, CSS, and basic JavaScript

Overview

This module combines the following mid-size modules: PostgreSQL On the Cloud, Mastering Supervised Learning, D3, and Flask.

Purpose

Students learn how to build their own app – a database-driven CRUD (Create, Read, Update, and Delete) app while utilizing Python, Flask, and PostgreSQL. Students then combine the CRUD technology with their machine learning and D3 visualization skills to develop their own predictive, interactive web app.

Prerequisites

Metis Intro to Python course (or basic python training)

Should also have experience writing HTML, CSS, and basic JavaScript

Interested?

If you're interested in any of our courses, please get in touch.