Statistical Foundations Overview
This course will serve as introduction to basic statistical principles that are often used by data scientists and applied statisticians. Many of the concepts will be reinforced by using the statistical programming language R, one of the two most popular languages for Data Science.
The intent of this course is to expose students to common statistical issues and teach them how to avoid statistical fallacies. We begin with a high-level overview of probability and common statistical estimates and then proceed to move advanced topics like multiple hypothesis testing, independence, sample size and power calculations as well as bootstrapping.
By the end of the course, students will have a fundamental understanding of many of the statistical principles that underlie machine learning and data science.
This course is open to beginners, but students should have some experience with coding (Python or R preferable but not required) and have a basic understand of calculus, linear algebra and probability. A brief review will be provided but prior experience would be very helpful.
Students may opt to skip the pre-work if they:
- Have taken an introductory course to statistics or probability in college
- Are familiar with Linear Algebra (either coursework or work experience)
- Are able to do a hypothesis test to determine:
- If a coin is fair given 100 flips
- Calculate a confidence interval for the mean height given 100 observations
- Explain how to test if events are independent
- Use Bayes Rules to see what the probability of an event is given another event
- Fit a linear model in R.
Otherwise, students should familiarize themselves with Chapters 1-6 of CK-12 Foundation’s Basic Probability and Statistics – A Short Course. Each chapter should take between 1-2 hours.
Upon completion of the course, students have: