Thursday: Live Online Bootcamp Q&A Event RSVP Here

Metis Approach to Data Science Education (Part 1): Project-Driven, Learning by Doing

By Paul Burkard • May 10, 2017

Foreword: This is the first entry in an ongoing series detailing the Metis approach to Data Science Education. The series will cover a variety of topics from strategies and philosophy to technologies and techniques, which have been cultivated through Metis’s firsthand experience instructing many aspiring data scientists. This was written by Paul Burkard, Metis Sr. Data Scientist based in San Francisco.

_____

Data Science is an immensely broad field. So broad, in fact, that when I tell people in tech that I teach data science bootcamps, where the end goal is to teach relative novices how to be useful data scientists in a 12-week timeframe, the most common response I receive is something like: “how is it possible to teach someone to be an expert in all of those advanced topics in only 12 weeks!?”  Well, the honest answer to that is: “it isn’t” – or, at least it isn't to be an expert on all topics.

How then, can one expect to achieve such an ambitious goal in so little time? My goal in this post is to convince you that it’s possible to impart sufficient competence in 12 weeks and explain how it can be done effectively using the approach that we employ at Metis. As a preview, the short answer is learned information prioritization through deliberate practice. But before we tackle the solution, allow me to delve a little bit further into the problem.   

The Problem: So Much to Do, So Little Time!

From a purely theoretical perspective, the amount of content underpinning a general data science bootcamp curriculum is enormous and quite daunting. If you don’t believe me, see for yourself. Below is a partial list of the topics expected to be covered in our bootcamp and/or its associated pre-work:

 
On the left side, we have basically an undergraduate degree in mathematics. When you take into account all of the different possible topics in machine learning and some of the deep linear algebra or statistics underlying them, then you’re talking about multiple graduate courses in statistics or machine learning to properly treat them exhaustively. Similarly, the center and right look like the scaffolding for a Bachelor’s in computer science. Add to that the seemingly infinite number of big data, web, visualization, or database technologies in the marketplace today and you’re looking at coursework that could reasonably compose Master’s degrees in Mathematics, Statistics, Computer Science, or Machine Learning. Finally, if you introduce some of the most advanced topics covered, like advanced Natural Language Processing or Deep Learning (huzzah!), we’re talking potentially PhD-level topics...yikes!  

The Metis Solution: Time, Exposure, and Pragmatism

Okay, you get it, there is too much to learn and too little time, right? Not so fast. Despite the mountain of theory to wade through, the Metis approach has a few secret weapons to lean on: namely time, exposure, and pragmatism. So let’s take a moment to understand what I mean by each of these, and how they combine to create an effective environment to accelerate data science learning.


Step 1: Mastering Time

First I’d like to consider the time component. I know what you’re thinking: “time, you say? Isn’t that an issue, not an asset?” At first blush, perhaps I would agree. However, when you compare the structure of a university class to that of a bootcamp, you begin to realize that 12 weeks can be an awful long time when used effectively.  

In a university course, the structure is often a few hours a week of lecture from professors and (possibly) some extra follow-up sessions with TAs to reinforce content. Plus, a student usually has multiple other courses to occupy their time, energy, and brainpower during a semester. In a bootcamp environment, a student gets 40 hours a week living and breathing data science. This concentrated time and focus can be exhausting on occasion, but it pays huge dividends in the end. Additionally, the compressed timeline naturally means unfairly short deadlines both for figuring out concepts and for completing coursework (projects, homework, etc), which is unfortunately how most real-world technology jobs often work!  

Some familiar adages from economics and psychology are relevant here, notably “Parkinson’s Law” and “Student Syndrome." Parkinson’s Law as applied to time roughly states that “work expands so as to fill the time available for its completion”, and Student Syndrome says what every college student knows: that there’s no motivator (or procrastination cure) quite like a hard deadline. In the context of the bootcamp, these natural psychological biases are used to students’ advantage. With little time to waste to meet deadlines, work has no room to expand and students can’t afford to procrastinate. Thus they learn to cut to the core of issues quickly and deliver results, simply because there’s no other choice; and ultimately the abbreviated timeframe forces students to maximize efficiency in their own learning and growth.


Step 2: Exposure to Expert Help

The second piece is exposure, which is a relatively straightforward advantage for the bootcamp. In a university setting – especially in large general courses like the math components listed above – the professors often give their lecture and then go about their day elsewhere, leaving the students to reinforce and understand the concepts for themselves (possibly with help from TAs).  

 

      In the bootcamp, students have the opportunity to ask questions and work through problems 1-on-1 with the instructors – real-world data scientists – 40 hours a week for 12 straight weeks. Beyond this, instructors have a vested interest in making students truly ready to do the job of data science so they can be successfully employed after the bootcamp. Side projects and independent work are a great way to skill up as a data scientist, but there’s simply no replacement for an on-call professional to help you when you are stuck. Because of this, the additional exposure can rapidly accelerate a student’s ability to push through issues and churn out useful work.


Step 3: Pragmatism - Figure Out What’s Important!

 

Finally, the last piece of the puzzle is pragmatism, on which Metis places the most emphasis. As discussed, there are time and exposure benefits to the bootcamp model, but even so, you’re still stuck with a mountain of things to learn in little time. In order to be successful, the skill a student most needs to learn is how to cut through the extraneous information to understand what is important for a task at hand. This is what I mean when I say pragmatism, and I think it’s the most valuable skill in any data scientist’s toolset. It can include knowing the formulas and code syntax that are important to memorize and which are okay to Google (most, in my opinion), which aspects are general underlying themes and which are nitty-gritty specifics, which tools make the most sense for a given job, and more. As they (non-relativistic mathematicians) say, “the shortest distance between 2 points is a straight line." As a teacher, my goal is to prepare students to know how to take the shortest path to deliver a useful solution for data science problems that they might face in the future. If that means knowing when and how to Google Stack Overflow, so be it – that’s probably my strongest skill anyhow (only half kidding).  


 

As an example, let’s consider an electrician. It is probably unlikely that your local electrician is currently a master of Maxwell’s equations for electromagnetism, which explain how electricity works. I, on the other hand, with a physics background once upon a time, could probably explain them reasonably well in theory. However, I’m still going to call my electrician before I go digging around in the wiring in my apartment. The electrician is a pragmatist, whereas, in this domain, I am a theorist. Similarly, the goal in training pragmatic data scientists is to teach them how to use the right tools for the right tasks to solve problems and deliver useful results.  


That doesn’t mean knowing Maxwell’s equations would be harmful to your electrician, but that at some level the minute details become extraneous to their task at hand. Similarly, for our data scientists-in-training, there is a certain core competency required to be valuable as a worker, and then deeper theoretical considerations that will probably end up sinking into varying degrees for different students (and different topics). From experience, I believe all students can capably learn those core competencies and use that as a base to build more theoretical depth where they so choose. The student’s biggest challenge is to be an active learner and, to some extent, to strategize the level of theory they’ll seek on different topics. Those decisions can vary among students based on their background and desired career path, but even the most impressive technical PhDs are only going to have so much learning space in their brains for a 12-week timespan. This is why we preach pragmatism; absorb the important concepts first, and then use them as a base to build upon. Still, pragmatism is quite a difficult topic to teach, as it’s challenging to delineate all of the important and unimportant formulas, concepts, etc. For us here at Metis, the best way to learn what matters in data science is to actually do data science, which leads me to the most important part of this post: our Project-Driven Approach.


Learning to be Pragmatic: The Metis Project-Driven Approach

For anyone who has read Outliers, “deliberate practice” is Malcolm Gladwell’s term (it’s not actually his) for not just practicing a skill to be learned, but practicing in a way that focuses on only the necessary aspects for improvement in performing the task at hand and discards extraneous pieces. At Metis, we echo this strategy with our emphasis on learning by doing, or repeated practice doing the full lifecycle of actual data science projects as the fastest way to implicitly learn how to be pragmatic.  

 


Students in the Metis Data Science Bootcamp are required to complete 5 data science projects from start to finish in order to graduate. The lectures throughout the bootcamp are very useful for laying a theoretical groundwork for machine learning and other data science aspects, but it’s in pushing through these 5 projects that students really learn what it takes to actually do data science. After all, that is the goal of the bootcamp: to give students the tools to be useful for employers. It’s in the projects where they have to figure out how to get the answers for themselves, where they have to make Google and StackOverflow their best friends, where they run into that edge case that they wouldn’t have thought of before, where they have to understand how to plan out every step of a project from hypothesis development to data acquisition, data cleaning, data exploration, data modeling, results presentation, and more. Implicitly and often almost invisibly to students, this is where they learn how to structure their thinking and frame a problem, plan a project strategy to solve that problem, and prioritize which information will most effectively help them get there.

Each time, they are going to make mistakes and get stuck, but they'll be better off for it the next time. Each time, they’ll get a little bit faster at performing each step and prioritizing their tasks. Each time, they’ll have eureka moments where they lament their stupidity on previous projects, but those become invaluable lessons that will never be forgotten. By the end, they’ll have a portfolio of 5 real data science projects that demonstrate their abilities and the confidence to tackle similar assignments that they might receive in future employment roles.

 

Below is a quick summary of each of the projects that students are required to complete during the Metis Data Science Bootcamp highlighting the new skills acquired along the way. Important to note is how the degree of autonomy given to the student increases from project to project, until on the last project they’re thrown into the wild for an end-to-end data science project that is completely their own.


Project 1: Data Exploration, Dealing with Yucky Data

For Project 1, students are given a painfully dirty dataset on New York subway ridership data and assigned to groups. Each group is given just 1 week to come up with a potential client that could find something of value in that data, to make a proposal to that client for work to be done, and to do some basic data exploration to demonstrate the validity of their approach. As an example, past clients have included marketers looking for strategic ad placement or ride sharing companies (Uber, Lyft) trying to optimize their fleet. Students are exposed to the idea of framing their problem in a business context right off the bat, and with munging through a disastrously ugly dataset to suggest solutions to their business problem. They also are introduced to data visualization techniques for their data exploration as well as presenting what they’ve found. Because they haven’t gotten into machine learning models yet, the big gain during project 1 is all the time getting to know the basics of Python and especially Pandas for manipulating data, things they’ll use over and over again in later projects.


Project 2: Regression, Building a Machine Learning Model

Whereas project 1 provides students with an existing (though ugly) dataset, project 2 ramps up the burden on the students regarding data acquisition. They’re required to scrape the data from a website on movie and box office data, and then use the newly learned techniques for regression to make some sort of numeric prediction on the data. Thus, like project 1 the problem is up to the students to create, with the caveat that regression be applied to solve it. The skill focus in this project is on data acquisition and feature/model selection to tackle their regression task.  Students iterate trying new models and feature sets to try and find the best performing model, then present their problem statement and solution in the end. Project 2 is an individual project that runs 2 weeks.


Project 3: Classification and Data Products: Packaging a Data Science Project

During project 3, students are learning about different classification algorithms, and they’re given a selection of datasets they can choose to apply them on. Students choose the dataset that they like most and teams are formed for each dataset to work together as if they were a data science team for a company. The datasets are such that they’re meant to work with SQL, which gives students hands-on practice with that invaluable technology standard. Also, we take a break from machine learning to introduce concepts of web applications and advanced data visualization (D3, plotly) combined into a “data products” mini-unit. Thus, during project 3 students learn to use classification algorithms, work with data in SQL, divvy up tasks in a team like a real-world project, and package up their solution into a fully operating demo with a web app hosting visualizations connected to a back-end database. An extra benefit of this is that students seeking to focus on data visualization or data engineering can do so while those more interested in data munging and modeling can do that instead. Project 3 is a 2.5-week project.


Project 4: Advanced Concepts: Natural Language Processing

During Project 4, students are almost completely free (but not quite yet). The focus in lectures is on unsupervised learning and then natural language processing, as well as consuming data from APIs. The requirements for the project are that it is on natural language processing, and it’s recommended that they get their data from an API and demonstrate some unsupervised learning with it. Understanding how text data can be turned into the types of features they’re familiar with for previous machine learning (and then having to actually do it with Python) is really a valuable step for students for seeing how everything fits together in data science. Not to mention, learning to work with APIs is crucial for many data scientists. Project 4 is an individual project that clocks in at 2.5 weeks.


Project 5: Passion Project: Demonstrating Mastery

Project 5 is complete freedom.  We call it their passion project because students are allowed to propose a problem on whatever interests them, have it approved, research and acquire relevant data, and deliver a solution as they see fit.  Project 5 is individual and lasts 4 weeks, with the final product being presented to potential employers on career day. See here are some examples of projects that amazing Metis students have completed in the past.


Additional Benefits of the Project-Driven Approach

As discussed, the key benefit of the project-driven approach is learning how to be pragmatic and how to do data science in the real world. However, there are other benefits to it as well!  First off, students learn how to frame things in a business context. From day one at Metis we stress to them that their projects are not useful if they cannot be applied to solving a valuable business problem (or at least, helping the world!). Second, students learn how to plan a project complete with appropriate deliverables and a minimum viable product (MVP). This design process should not be overlooked, as they learn to first work toward their MVP, and then iterate on further improvements as time allows. The iterative nature of the data science lifecycle is something that can be foreign to some at first but is definitely ingrained in all once they’ve worked through 5 projects.  Lastly, the aspect of communication cannot be stressed enough.  For every project, students have to communicate their results in a presentation and present them in a straightforward way such that the business value would be inherent to potential stakeholders or managers.  Often times these are termed “soft skills”, but they really are vital to being a useful data scientist, as if you can’t communicate what you’ve done and how it is valuable then it’s difficult for your work to help a business.


In Summary: Learn by Doing...Because It Works!

The project-driven approach has been with Metis since its inception in 2014, and it has stuck around thanks to its effectiveness in squeezing the most out of 12 weeks. Through deliberate practice at the task of actual data science projects like they might see in a data science job, students learn for themselves how to best prioritize information in their brains to help them deliver actionable insights for hypothetical clients.  We might not have ten thousand hours to practice in the bootcamp, but the concept is the same. With enough practice will come enough hurdles, and hence subsequent triumphs over them which lay the groundwork for future success in data science, all in a nice and tidy 12 weeks.

// <![CDATA[ <g class="gr_ gr_170 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="170" data-gr-id="170">img</g> {float: left; margin: 1em;} // ]]>


Similar Posts

data science
Sr. Data Scientist Roundup: Seq2Seq with Pytorch, Vital Communication Skills, and Free Data Science Workshops

By Emily Wilson • August 23, 2019

When our Sr. Data Scientists aren't teaching the intensive, 12-week bootcamps or corporate training courses, they're working on a variety of other projects. This monthly blog series tracks and discusses some of their recent activities and accomplishments.

data science
Made at Metis: Predicting Court Decisions with Machine Learning, Analyzing Why Planes Crash, & More

By Emily Wilson • August 28, 2019

This post features three projects from recent graduates of our data science bootcamp. Take a look at what's possible to create in just 12 weeks.

data science
Made at Metis: Clustering NBA Playstyles Using Machine Learning; Automatic Pricing on Etsy

By Emily Wilson • November 04, 2019

This post features two final projects from recent graduates of our data science bootcamp. Take a look at what's possible to create in just 12 weeks, including these projects on Clustering NBA Playstyles Using Machine Learning and Automatic Pricing for Etsy Sellers.