Enroll in one of our Short Immersive courses here.
Bootcamp Prep Course
Introduction to Data Science
Part-Time, Live Online Course
This course takes you one step closer to becoming a data scientist by offering a subset of the topics covered in our Data Science and Analytics Bootcamps. You'll get a well-rounded intro to the core concepts and technologies taught within the bootcamp, including basic machine learning principles and hands-on coding experience. Plus, you'll put it all to practice through a mini data science project of your own. We'll cover the following:
Data acquisition, cleaning, and aggregation
Exploratory data analysis and visualization
Model creation and validation
Basic statistical and mathematical foundations for data science
Course designed by Sergey Fogelson, VP of Analytics and Measurement Sciences, Viacom
The Intro to Data Science instructor’s enthusiasm and ability to explain complex topics made this a great introduction to the fundamentals of data science and Python programming. This course helped prep me for the Metis data science bootcamp, and I'd highly recommend it to anyone looking to gain a better understanding of concepts taught throughout the bootcamp.
Data Scientist, JPMorgan Chase & Co
Who the course is designed for:
You have a strong desire to learn data science through top-quality instruction, a basic understanding of data analysis techniques and an interest in improving their ability to tackle data-rich problems in a systematic, principled way. This course provides structure and accountability to ensure you stay on track, finish strong, and achieve your desired outcomes.
An understanding of problems solvable with data science and an ability to attack them from a statistical perspective.
An understanding of when to use supervised and unsupervised statistical learning methods on labeled and unlabeled data-rich problems.
The ability to create data analytical pipelines and applications in Python.
Familiarity with the Python data science ecosystem and the various tools needed to continue developing as a data scientist.
Have questions? Get answers to frequently asked questions. FAQs
What you'll receive upon completion:
Certificate of completion
Dates & Instructors
Check back soon for our next scheduled course.
Students should have some experience with Python and have some familiarity with basic statistical and linear algebraic concepts such as mean, median, mode, standard deviation, correlation, and the difference between a vector and a matrix. In Python, it will be helpful to know basic data structures such as lists, tuples, and dictionaries, and what distinguishes them (that is when they should be used). Students should skip the pre-work if they can accomplish all of the following:
Write a program in Python that finds the most frequently occurring word in a given sentence.
Explain the difference between correlation and covariance, and why the difference between the two terms matters.
Multiply two small matrices together (e.g. 3X2 and 2X4 matrices).
Otherwise, students should complete the following pre-work (approximately 8 hours) before the first day of class:
Videos 1-6 of Linear Algebra review from Andrew Ng’s Machine Learning course (labeled as: III. Linear Algebra Review (Week 1, Optional).
The exercises in Chapters 2 and 3 of OpenIntro Statistics. (This book is free, but there is a suggested donation. Feel free to donate an amount or set it to zero.)
Students must have aGithubaccount to get access to the content. Sign-up for an account on their site is free, fast and easy.
Course Structure & Syllabus
CS/Statistics/Linear Algebra Short Course
We start with the basics. For CS, we briefly cover basic data structures/types, program control flow, and syntax in Python. For statistics, we go over basic probability and probability distributions, along with general properties of some common distributions. For linear algebra, we cover matrices, vectors, and some of their properties and how to use them in Python.
Exploratory Data Analysis and Visualization
We spend a considerable amount of time using the Pandas Python package to attack a dataset we’ve never seen before, uncovering some useful information from it. At this point, students decide on a course project that would benefit from the data-scientific approach. The project must involve public (freely-accessible and usable) data and must answer an interesting question, or collection of questions, about that data. (Several resources of free data will be provided.)
Data Modeling: Supervised/Unsupervised Learning and Model Evaluation
We learn about the two basic kinds of statistical models, which have classically been used for prediction (supervised learning): Linear Regression and Logistic Regression. We also look at clustering using K-Means, one of the ways you can glean information from unlabeled data.
Data Modeling: Feature Selection, Engineering, and Data Pipelines
We switch gears from talking about algorithms to talk about features. What are they? How do we engineer them? And what can be done (Principal Component Analysis/Independent Component Analysis, regularization) to create and use them given the data at hand? We also cover how to construct complete data pipelines, going from data ingestion and preprocessing to model construction and evaluation.
Data Modeling: Advanced Supervised/Unsupervised Learning
We delve into more advanced supervised learning approaches and get a feel for linear support vector machines, decision trees, and random forest models for regression and classification. We also explore DBSCAN, an additional unsupervised learning approach.
Data Modeling: Advanced Model Evaluation and Data Pipelines | Presentations
We explore more sophisticated model evaluation approaches (cross-validation and bootstrapping) with the goal of understanding how we can make our models as generalizable as possible. Students complete data science projects and share learnings and discoveries.
Live Online Interactive Learning
Learn from world-class data science practitioners.
Our Live Online instructors bring deep industry experience from a broad range of industries and companies including Viacom, Spotify, and Capital One Labs. You’ll have an Instructor and Assistant Instructor to support you throughout your learning process.
Interact with instructors and classmates in real-time.
This course is truly live, which means you can interact with the instructors and your fellow students in real-time. Stay engaged by asking questions and participating in polls and conversations, and join your course Slack channel for additional support, communication, and collaboration.
Learn online without sacrificing the value of live instruction.
The world is your classroom. Log in from wherever you are and gain access to live, interactive data science instruction that will push your career further in the right direction. In case you have to miss a class, you can access all recordings 24/7 to stay caught up and refer back.
Register for an on-demand sample class
Our 1-hour on-demand sample class is a great way to preview what the Live Online experience is like.
Nathan Grossman, an instructor of the Live Online Introduction to Data Science course, will cover a few sample topics in the on-demand class:
Python is a requirement for the course. In Python, it will be helpful to know basic data structures such as lists, tuples, and dictionaries, and what distinguishes them (that is, when they should be used). Python v3 is currently used in the course.
While there is no official homework, you can expect to spend a minimum of 3 hours per week reviewing material or working on projects. The non-class time spent will depend on your background and the course itself. Each instructor will address this on the first day of class, and there will be lab/office hours outside of class during which students and the instructor can collaborate.
Students work on a final project in this course. Here is an example project, which analyzes the likelihood of pets getting adopted in shelters, and here’s another example about predicting star ratings on Yelp.
No. We do not offer career support for students of these courses like we do for our bootcamp students, but you will gain access to our alumni community network of 1000+ data scientists. Networking events and job opportunities are posted on a regular basis in this active digital community.
Our part-time course instructors come to teach at Metis from industry and are not bootcamp instructors. Please visit the respective course pages for specific information on each instructor’s background and current jobs.
Our part-time courses typically run two nights per week over the course of 6 weeks, totaling 36 hours of instruction, but this can vary. Please see the full schedule here for the most up-to-date information. We consistently add new courses, so be sure to check back routinely.
The beauty of the Live Online format is that you’re taught by our industry-leading instructors live, but you can attend class sessions from literally anywhere you have an internet connection. Unlike some other online course options out there, which might consist of pre-recorded lectures, our courses allow for interaction with the instructor, teaching assistants, and other students – and because these are on a set schedule, you’ll be held accountable to actually attend, do the work, and learn the material (which is what you’re really here for anyway!).
The curriculum will be provided via Github; therefore, you must register aGithubaccount. Sign-up for an account on their site is free, fast and easy. Github is a web-based hosting service for version control using Git.