Statistical Foundations
for Data Science and Machine Learning

Statistical Foundations Overview

This course will serve as introduction to basic statistical principles that are often used by data scientists and applied statisticians. Many of the concepts will be reinforced by using the statistical programming language R, one of the two most popular languages for Data Science.

The intent of this course is to expose students to common statistical issues and teach them how to avoid statistical fallacies. We begin with a high-level overview of probability and common statistical estimates and then proceed to move advanced topics like multiple hypothesis testing, independence, sample size and power calculations as well as bootstrapping.

By the end of the course, students will have a fundamental understanding of many of the statistical principles that underlie machine learning and data science.

This course is offered both in-person at Metis campuses and Live Online from anywhere. Sign up for a free Live Online sample class.

Considering the data science immersive bootcamp?

Part-Time Alumni can apply the amount of tuition paid for one part-time professional development course towards enrollment in an upcoming bootcamp upon admittance.


This course is open to beginners, but students should have some experience with coding (Python or R preferable but not required) and have a basic understand of calculus, linear algebra and probability. A brief review will be provided but prior experience would be very helpful.

Students may opt to skip the pre-work if they:

  1. Have taken an introductory course to statistics or probability in college
  2. Are familiar with Linear Algebra (either coursework or work experience)
  3. Are able to do a hypothesis test to determine:
    • If a coin is fair given 100 flips
    • Calculate a confidence interval for the mean height given 100 observations
    • Explain how to test if events are independent
    • Use Bayes Rules to see what the probability of an event is given another event
    • Fit a linear model in R.

Otherwise, students should familiarize themselves with Chapters 1-6 of CK-12 Foundation’s Basic Probability and Statistics – A Short Course. Each chapter should take between 1-2 hours.


Upon completion of the course, students have:

An understanding of basic statistical hypothesis testing and confidence intervals.
The ability to model data using well known statistical distributions as well as handle data that is both continuous and categorical.
The ability to perform linear regression and adjust for multiple hypothesis.
An understanding of how to calculate the number of samples needed to achieve required sensitivity and specificity.
An understanding of bootstrapping and Monte Carlo simulation.
Greg ryslik
Greg Ryslik

Greg Ryslik graduated summa cum laude from Rutgers College and Rutgers Business School with a triple major in mathematics, computer science and finance. He then went on to complete a Masters degree in statistics from Columbia University as well as a PhD in biostatistics from Yale University. He has extensive experience as a teacher and a tutor, and has given talks in the United States and internationally. In the fall of 2011, Gregory was one of a select few to be chosen to participate in UCLA’s prestigious Institute for Pure and Applied Mathematics. In academia, he has helped students learn categorical data analysis, design & analysis of epidemiological models, longitudinal data analysis, introduction to statistics and calculus. He is currently an Adjunct Assistant Professor with the statistics department at the Pennsylvania State University. During his career, he has co-founded several companies, published an actuarial textbook and has worked both on Wall Street and in Biotech. He has written several publicly available bioinformatics software packages and is an author of numerous scientific publications in journals such as Nature and BMC Bioinformatics. More recently he led the Data Science team for Service at Tesla Motors and currently is the Head of Data Science and Analytics at Faraday Future.

Course Structure and Syllabus

Class 1

Basic Probability, Expected Value, Variance, Point Estimates, Introduction to R

We will start the course with a review of basic probability and how to compute basic properties of a random variable such as the expected value and variance.

We will also clearly define what is a point estimate and how that varies from a statistical estimate. How to compute these properties will be examined via R.

Class 2

Further Probability, Central Limit Theorem, Law of Large Numbers, Hypothesis Testing

We will use probability to calculate probabilities about binomial and normal distribution. We will explore the central limit theorem and the law of large numbers to understand how to calculate probabilities of events for averages. This will lead us into basic hypothesis testing and an exploration of how to interpret testing results.

Class 3

P-Values, Multiple Comparisons, Bonferroni Adjustment

We will explore the formal definition of a confidence interval as well as its interpretation. We will also discuss the issue of multiple comparisons and provide an example of a false positive. We will then explain the use of a Bonferroni Adjustment as well as the False Discovery Rate.

Class 4

Introduction to Regression, Prediction, Hypothesis Testing for Regression

Given a set of continuous outcomes and predictive variables, we will create a linear regression model using R. We will then explain how to use that model to generate predictions for new observations as well as test if any of the coefficients have statistically significant parameters.

Class 5

Model Selection for Regression, Backwards/Forwards, R^2 and other selection criteria

We’ll look at how to select models when using a variety selection criteria such as R^2 and adjusted R^2. We’ll also look at backwards, forwards and best subset regression. Finally, we’ll briefly cover logistic regression and how/why it’s used.

Class 6

Categorical Data, 2x2 tables, Simpson’s Paradox

We will introduce the odds ratio for a 2x2 table as well as a statistical test for independence. We will also introduce 2x2xk table with an example of Simpson’s paradox.

Class 7

Independence, MxN tables and trend, Fisher’s Permutation Test

We provide further examples of independence along with the introduction of larger tables. Trends and advanced categorical analysis will be covered. We will then go into Fisher’s exact permutation test to explore what hypothesis testing can be done on small sample sets.

Class 8

Correlation & Causation

We will provide several examples of how to calculate correlation for both continuous and categorical variables. We will also provide how to calculate confidence intervals to determine if the correlation is significant. Finally we will explore the correlation implies causation fallacy and provide some counter examples.

Class 9

A/B testing, Hypothesis Testing proportions, More General Hypothesis

Here we provide several examples of hypothesis testing as it relates to Data Science and web design. We’ll also cover hypothesis testing & confidence intervals for proportions and variance.

Class 10

Sample Size & Power Calculation / Method of Moments Estimation

We will work through several examples on how to calculate the required sample size given a specific level of false positives and a pre-specified power level. We will go into more detail why it’s only possible to reject or fail to reject a null hypothesis (and not to accept a null hypothesis). Next, we will switch gears and cover Method of Moments, compare it to MLE and take a look at a few examples.

Class 11

Bootstrapping, the Information Matrix & Variance Bound

We discuss some options one can use if they are dealing with small amounts of data, specifically the bootstrap method. We’ll then switch gears and touch upon the information matrix and how to calculate a theoretical lower bound on the variance of any statistic of interest.

Class 12

Expectation-Maximization Algorithm, Bias/Variance Trade Off

We’ll explore the details of the expectation maximization algorithm and how it’s used in the presence of latent variables for estimation. We’ll work through an analytical example as well as how to use R to do it. We will also cover the Bias/Variance tradeoff when modeling and the pitfalls of overfitting.