TOMORROW: Livestream Bootcamp Info Session @6pm PT. Register Now

Statistical Foundations
for Data Science and Machine Learning

Statistical Foundations Overview

This course will serve as introduction to basic statistical principles that are often used by data scientists and applied statisticians. Many of the concepts will be reinforced by using the statistical programming language R, one of the two most popular languages for Data Science.

The intent of this course is to expose students to common statistical issues and teach them how to avoid statistical fallacies. We begin with a high-level overview of probability and common statistical estimates and then proceed to move advanced topics like multiple hypothesis testing, independence, sample size and power calculations as well as bootstrapping.

By the end of the course, students will have a fundamental understanding of many of the statistical principles that underlie machine learning and data science.

This course is offered both in-person at Metis campuses and Live Online from anywhere. Sign up for a free Live Online sample class.

Considering the data science immersive bootcamp?

Part-Time Alumni can apply the amount of tuition paid for one part-time professional development course towards enrollment in an upcoming bootcamp upon admittance.


This course is open to beginners, but students should have some experience with coding (Python or R preferable but not required) and have a basic understand of calculus, linear algebra and probability. A brief review will be provided but prior experience would be very helpful.

Students may opt to skip the pre-work if they:

  1. Have taken an introductory course to statistics or probability in college
  2. Are familiar with Linear Algebra (either coursework or work experience)
  3. Are able to do a hypothesis test to determine:
    • If a coin is fair given 100 flips
    • Calculate a confidence interval for the mean height given 100 observations
    • Explain how to test if events are independent
    • Use Bayes Rules to see what the probability of an event is given another event
    • Fit a linear model in R.

Otherwise, students should familiarize themselves with Chapters 1-6 of CK-12 Foundation’s Basic Probability and Statistics – A Short Course. Each chapter should take between 1-2 hours.


Upon completion of the course, students have:

An understanding of basic statistical hypothesis testing and confidence intervals.
The ability to model data using well known statistical distributions as well as handle data that is both continuous and categorical.
The ability to perform linear regression and adjust for multiple hypothesis.
An understanding of how to calculate the number of samples needed to achieve required sensitivity and specificity.
An understanding of bootstrapping and Monte Carlo simulation.

Course Structure and Syllabus

Class 1

Basic Probability, Expected Value, Variance, Point Estimates, Introduction to R

We will start the course with a review of basic probability and how to compute basic properties of a random variable such as the expected value and variance.

We will also clearly define what is a point estimate and how that varies from a statistical estimate. How to compute these properties will be examined via R.

Class 2

Further Probability, Central Limit Theorem, Law of Large Numbers, Hypothesis Testing

We will use probability to calculate probabilities about binomial and normal distribution. We will explore the central limit theorem and the law of large numbers to understand how to calculate probabilities of events for averages. This will lead us into basic hypothesis testing and an exploration of how to interpret testing results.

Class 3

P-Values, Multiple Comparisons, Bonferroni Adjustment

We will explore the formal definition of a confidence interval as well as its interpretation. We will also discuss the issue of multiple comparisons and provide an example of a false positive. We will then explain the use of a Bonferroni Adjustment as well as the False Discovery Rate.

Class 4

Introduction to Regression, Prediction, Hypothesis Testing for Regression

Given a set of continuous outcomes and predictive variables, we will create a linear regression model using R. We will then explain how to use that model to generate predictions for new observations as well as test if any of the coefficients have statistically significant parameters.

Class 5

Model Selection for Regression, Backwards/Forwards, R^2 and other selection criteria

We’ll look at how to select models when using a variety selection criteria such as R^2 and adjusted R^2. We’ll also look at backwards, forwards and best subset regression. Finally, we’ll briefly cover logistic regression and how/why it’s used.

Class 6

Categorical Data, 2x2 tables, Simpson’s Paradox

We will introduce the odds ratio for a 2x2 table as well as a statistical test for independence. We will also introduce 2x2xk table with an example of Simpson’s paradox.

Class 7

Independence, MxN tables and trend, Fisher’s Permutation Test

We provide further examples of independence along with the introduction of larger tables. Trends and advanced categorical analysis will be covered. We will then go into Fisher’s exact permutation test to explore what hypothesis testing can be done on small sample sets.

Class 8

Correlation & Causation

We will provide several examples of how to calculate correlation for both continuous and categorical variables. We will also provide how to calculate confidence intervals to determine if the correlation is significant. Finally we will explore the correlation implies causation fallacy and provide some counter examples.

Class 9

A/B testing, Hypothesis Testing proportions, More General Hypothesis

Here we provide several examples of hypothesis testing as it relates to Data Science and web design. We’ll also cover hypothesis testing & confidence intervals for proportions and variance.

Class 10

Sample Size & Power Calculation / Method of Moments Estimation

We will work through several examples on how to calculate the required sample size given a specific level of false positives and a pre-specified power level. We will go into more detail why it’s only possible to reject or fail to reject a null hypothesis (and not to accept a null hypothesis). Next, we will switch gears and cover Method of Moments, compare it to MLE and take a look at a few examples.

Class 11

Bootstrapping, the Information Matrix & Variance Bound

We discuss some options one can use if they are dealing with small amounts of data, specifically the bootstrap method. We’ll then switch gears and touch upon the information matrix and how to calculate a theoretical lower bound on the variance of any statistic of interest.

Class 12

Expectation-Maximization Algorithm, Bias/Variance Trade Off

We’ll explore the details of the expectation maximization algorithm and how it’s used in the presence of latent variables for estimation. We’ll work through an analytical example as well as how to use R to do it. We will also cover the Bias/Variance tradeoff when modeling and the pitfalls of overfitting.