Made at Metis

The Album Discoverer

Angeline Protacio
Data Scientist at Quartet Health

In the age of music-streaming services and shared playlists, listening to full albums is less popular than it once was. That's a shame, according to bootcamp graduate Angeline Protacio, who used her final project to create The Album Discoverer tool to help listeners discover entire albums.

How to Build a Voting Recommendation Engine Using Twitter Profiles

Samir Thanedar
Early/Absentee Vote Data Analyst at Biden for President

Considering the political polarization in the United States, bootcamp graduate Samir Thanedar decided to focus on politics for his final project. Instead of focusing on the national political landscape, however, he wanted to dive into local politics considering how important local elected officials are to our day-to-day lives.

Predicting Spotify Track Skips

Austin Poor

For his third of five projects during the bootcamp, graduate Austin Poor worked on what he calls "a slightly simplified version of the Spotify Sequential Skip Prediction Challenge." As he details in a blog post about the project, Spotify supplied two main sets of information for this competition.

Pneumonia Detection: Pushing the Boundaries of Human Ability with Deep Learning

Jenny Wang
Strategy and Operations Analyst at Google

Getting pneumonia may be more common than you think. According to bootcamp graduate Jenny Wang, it's the most common reason for U.S. children to be hospitalized and it's the most common cause of hospital admissions for U.S. adults other than women giving birth.

AI-Generated Guided Meditations with GPT-2

Neil Fonseca

For his final bootcamp project, Neil Fonseca fine-tuned a language model to generate audible, guided meditation sessions. In this post, Neil goes into detail on the steps he took to create his project.

Predicting Stock Performance from Quarterly Earnings Conference Calls

Linda Ju
Data Science Consultant at Slalom

Bootcamp graduate Linda Ju studied finance in business school and had five years of experience in the banking industry before attending the bootcamp. For her final project, she focused on Predicting Stock Performance from Quarterly Earnings Conference Calls.

Big Data and Machine Learning for B2B Marketers

Saleem Khan
Founder/CEO of HyperPlayn Corporation

In this project, Saleem Khan set out to help B2B marketers by providing them with valuable insights based on archive data. Marketers are always looking for accurate business information in order to better reach their targets, and they often use directories and business registries to find it. Read his post to learn more about his process and what he recommends.

Deutscher Bundestag: Who Are Your Representatives?

Erik Hafner
Consultant at Bain & Co.

Metis graduate Erik Hafner lives and works in Berlin, Germany. For his final project, he chose to examine the central body of German politics, the Bundestag. His aim was to "further increase transparency on the German parliament and its members by applying techniques from machine learning and natural language processing (NLP)."

Prospecting LA’s Backyard Houses with Machine Learning

Anu Garla

For her final project during the bootcamp, recent graduate Anupama Garla looked into answering the question: Living in Los Angeles, should I build a backyard home for extra income? She began to build a tool, which homeowners could use to determine the potential income and the feasibility of such a project. From Anupama's point of view, this tool would be beneficial to both homeowners and people like her, who are in the market to rent or own in the area. Using publicly available Airbnb and LA Geo datasets, she first built a model that predicted the income of a property.

A Content Based Live Music Recommender

Gabriel Bond

To merge his interest in live music with data science, recent graduate Gabriel Bond chose to create a content-based live music recommender during the bootcamp. "The result of this exploration utilizes unsupervised learning techniques, audio feature extraction with the LibROSA Python library, and both the Spotify and Songkick APIs to generate a playlist of songs by artists with upcoming shows in the user’s city based on the user’s favorite artists," he wrote in a blog post about the project.

Space Audio Classification: inconspicuously_scraping_NASA

Sami Ahmed

When bootcamp graduate Sami Ahmed isn't working on data science projects, he's most likely making music and thinking about his interest in audio more generally. He recently found that NASA and the University of Iowa host a "massive library of electromagnetic waves that happen to fall in the audible human frequency range," he wrote in a blog post. Looking to get his hands on this audio, he figured out how to quickly web scrape hundreds of hours of audio from NASA using Beautiful Soup.

Street Art to Fine Art

Michael Jordan

For his final project at Metis, Michael trained a convolutional neural network auto-encoder to capture the essential visual features of artwork and developed a recommendation app that would compare a user-uploaded image of street art to a corpus of more than 35,000 images of fine art and then return those images of fine art, along with associated mediate, that were most similar.

Smulemates: A Karaoke Singing Style Recommendation System

Catherine Magsino

For bootcamp graduate Catherine Magsino, karaoke is more than just an occasional hobby. "I’ve grown up singing karaoke for as long as I remember – at home, at family parties, and at get-togethers with friends. It is not only one of my favorite hobbies, but also a big part of my own culture. So naturally, I decided to focus my final project on this great pastime," she wrote in a blog post about her project.

The Art of Hiring Offshore Talent

Katherine Bell

Bootcamp graduate Katherine Bell recently dove into the hiring practices of startups in the United States, analyzing whether or not hiring offshore talent makes economic sense. In order to do so, she looked at data within the 2019 Stack Overflow Developer Survey, which included around 40,000 non-US based respondents.

Smarter Pricing for Airbnb Using Machine Learning

Alison Glazer

For her final bootcamp project, Alison Glazer looked into Airbnb's smart pricing tool, which was introduced years ago but faced immediate problems. According to Alison, the biggest issue was that price suggestions were too low and hosts noticed their revenues decrease when using the tool.

Predicting Opponent Strategy in StarCraft

Alexander Parker

This project attempts to predict an opponent’s strategy in StarCraft using real, in-game observations through a Bayesian network model (I think that is a fair label, but let me know if you disagree!).

A Simple Approach To Building a Recommendation System

Molly Liebeskind

When you think about recommendation systems, Netflix might come to mind first based on its ubiquity and power, writes bootcamp grad Molly Liebeskind in a blog post about her final project. But even while recommenders can be complex, Molly identifies two simple approaches (content-based filtering and collaborative filtering) that are good starting points to understanding how they work and to building one of your own.

Building a Vocal Emotion Sensor with Deep Learning

Alex Muhr

Teaching machines to better understand human communication.

Navigating Media Bubbles With Data Science

Aisulu Omar

Having lived in different parts of the world, Aisulu Omar realized the importance of addressing and navigating news media bubbles. She set out to create a news recommendation platform that would provide a balanced perspective for all.

Clustering NBA Playstyles Using Machine Learning

James Fan

I love basketball. I love playing it, watching it, or arguing scenarios with friends like who would win one on one, Kobe or Lebron. I had to combine my two passions, basketball and data science, in a machine learning project.

Personalized, Generative Narratives

Nicholas Sherwin

For his capstone project, Nicholas Sherwin decided to continue working with natural language, film characterization and IBM Watson personality insights. By leveraging personality insights derived from a user's twitter profile, in conjunction with other elements, to craft personalized content based on a user’s personality profile.

Earthquake Prediction with Machine Learning

Isaac Kim

As a California resident, Isaac Kim is familiar with the danger that earthquakes present. Having grown up near the San Andreas fault, he was taught at a young age to prepare for the inevitable "big one." Naturally, this inspired him to explore how modern computing power and machine learning methods can be employed to try and tackle earthquake prediction.

Classifying Car Images Using Features Extracted from Pretrained Neural Networks

Bhanu Yerra

Did you know? 40 million used vehicles were sold in the US last year; Representing about 70% of the total vehicles sold. Given that a good portion of those sales already use online resources along various stages of purchasing, Bhanu Yerra thought a car image classification system could address several business issues for leading automotive sellers.

Mind Reading Lady Justice: Predicting court decisions using machine learning

Luke Persola

Using the Federal Contractor Misconduct Database, curated by the nonprofit Project on Government Oversight (POGO), Luke Persola set out to build a piece of machine learning software that could predict the outcome of a given case of contractor misconduct based on the other information in the database.

What To Drink Next? A Simple Beer Recommendation System using Collaborative Filtering

Medford Xie

As a self-proclaimed beer enthusiast, Medford Xie routinely found himself looking for new brews to try – but he dreaded the possibility of disappointment once actually experiencing the first sips. This often led to purchase-paralysis. For his final project at Metis, he set out " to utilize machine learning and readily available data to create a beer recommendation engine that can curate a customized list of recommendations in milliseconds."

Finding Success on Twitch

Natasha Borders, Randy Macaraeg, and Jeremy Chow

In this blog post, Natasha Borders walks you through how she built a streamer recommender for Twitch (live streaming platform for gamers), including details on the various tools used to make the resulting app, which is available now on Heroku.

Produce2Recipe: "What Should I Cook Tonight?"

Jhonsen Djajamuliadi

After trying out a couple existing recipe recommendation apps, Jhonsen Djajamuliadi thought to himself, “Wouldn’t it be nice to use my phone to take photos of stuff in my refrigerator, and then get (personalized) recipes from them?” He decided to go for it, creating a photo-based recipe recommendation app for his final bootcamp project.

A Data-Driven Historical Analysis of Cosmo

Nora G. May

In an attempt to understand the market appeal of magazines, specifically women’s magazines, Nora G. May used data analytical tools to abstract magazines' marketing techniques. She extracted the text from magazine covers, performed NLP topic modeling, and used image processing techniques to understand graphic trends and representation.

Predicting NBA Player Salaries using Linear Regression

Dotun Opasina

For his second project at Metis, Dotun Opasina sought to predict NBA players’ salaries per season based on their statistics using Linear Regression. This project could be used by both individual players and managers to evaluate the impact a particular player is making on a team and to know whether to increase the players’ salary or trade the player.

Fateful Findings

Aaron Wilson
Data Scientist, Strata Decision Technology

"When we hear about an airplane accident, there are a few things we always want to know. Where did it happen? Is everyone okay? And, of course: why did it crash?" writes Aaron Wilson. Find out how he used data science to find answers.

Nuanced Analysis of LocalBitcoins Data Suggests Bitcoin is Working as Satoshi Intended

Matt Ahlborg
Consultant at Airtm

While Bitcoin showcases a vast body of anecdotal evidence that argues its utility, or potential future utility, due to the pseudonymous nature of the blockchain as well as other factors, it's still hard to come across aggregate data that shows this behavior occurring on a consistent and measurable scale. To attempt to address this problem, Ahlborg took a close look at trading volumes on the Peer-to-Peer Bitcoin trading website

Investigating Worker Exploitation in California

Ash Chakraborty
Data Scientist at Credit Sesame

At a recent DataKind SF event, Ash was rather intrigued by the challenges faced in investigating wage theft and other labor violations not just throughout the nation, but also specific to California and the Bay Area regions.

Predicting overdose mortality per US county

Joyce Lee
Data Scientist at Clover Health

The opioid epidemic is one of the major public health catastrophes for this generation of Americans; similar to what tobacco/smoking or HIV/AIDS were to earlier generations, the opioid epidemic appears to be this era’s defining public health crisis. Lee set out to build a model to predict opioid-related mortality on a county by county basis with location-based insights and interventions in mind as a larger goal.

Training a Neural Network to Detect Gestures with OpenCV in Python

Brenner Heintz

Imagine you’re hosting a birthday party. Everyone’s having a great time, music’s playing, and the party is noisy. Suddenly, it’s time for birthday cake! It’s too loud to use Alexa, and rather than hunting for your phone or a remote control, what if you could simply raise an open hand while in mid-conversation, your smart home device would recognize that gesture, and turn off the music? And with the same gesture, you could dim the lights  just as the candles are lit?

Chord Classification using Neural Networks

Henri Dwyer
Data Scientist at Dataiku

Henri is currently working on classifying chords from audio using neural networks.

Hack a Designer Look by Using Convolutional Neural Networks

Jiamin Han

Although uniqueness and personalization are great selling points, keeping up with the fashion trend is still the major theme that runs throughout the retail fashion business. The goal of this project is to find affordable alternatives to a designer outfit by using convolutional neural networks and other deep learning techniques.

Obtaining Insights From Data: Optimizing an NBA Career

Aaron Frederick

Being a statistics-motivated sports fan, Frederick wanted to solve an atypical basketball problem: How can we optimize a typical basketball player’s career in the NBA? The question itself may seem open-ended, so in order to better scope this endeavor, he measured success by dollars earned.


Phillip Tan

Data Sciencing Motorcycles: Lean Assist

Josh Peng

Motorcycle Lean Assist uses a convolutional neural network to detect the lean angle of a motorcycle through image classification, providing you with rider feedback on your current lean angle so you don’t have to guess.

Estimating House Prices in San Francisco

Rui Chang
Lead Data Scientist at Target

This project is trying to estimate house prices based on the features using publicly available data, and build a web application for house prices estimation.

Incorporating Curb Appeal Into Home Price Estimates Using Deep Learning

Lauren Shareshian

Lauren used Zillow metadata, natural language processing on realtor descriptions, and a convolutional neural net on home images to predict Portland home sale prices.

The Dankstimate: Cannabis Price Estimation — Part I

AJ Davis

With cannabis being legal for medicinal use in 31 states and recreational use in 9 states, there are thousands of dispensaries from which one can obtain pricing data to analyze. Davis thought it was a good time to dive into cannabis pricing to build a model that outputs a price benchmark for dispensaries (a “dankstimate” in the vein of Zillow’s “zestimate”).

Which Hotel to Recommend?

Xuan Qi
Machine Learning Engineer at Petuum, Inc.

Xuan Qi's goal for her project was to "accurately match customers with hotel inventory in this highly competitive market." On a personal note, she writes that "as a mom, when I book a hotel, I would like the hotel to be family friendly, closer to the sightseeing, and relatively quiet. But, my standards would be different booking a romantic weekend for me and my husband. We would like to pick the hotels with great food, closer to bars, and musical events are a plus. "

Take Control of your Healthcare with MedTracker

Katherine Pully

MedTracker i s a system to track your (psychiatric) medications and your moods and to compare what is working well for other users. Users can register as a patient or a doctor.

Travel Time Optimization with Machine Learning and Genetic Algorithm

Vladimir Lazovskiy
Data scientist working at the intersection of machine learning, content creation, and media.

In this project, Vladimir tackles the question: what is the relationship between machine learning and optimization?  He explores how delivery companies can use the power of machine learning to forecast travel times between two locations and use the genetic algorithm to find the best travel itinerary for each delivery truck.

Major or Minor? Classifying the Mode of a Song

Alex Smith

Alex chose to work with music data because it is a type of audio that can evoke emotion in addition to thought. When she listens to music, she asks herself, "Why does a particular song make me feel happy or sad?" The key of a song helps determine the feeling and is made up of the tonic note and the mode. For this project, she aimed to predict the mode.

Text to Video Generation with AI

Antonia Antonova

This project aims to build a deep learning pipeline that takes text descriptions and generates unique video depictions of the content described.

Everyone Poops

Mattie Terzolo

In San Francisco, human waste is a growing issue, both for the people who run into it and for the people who have no other option than to relieve themselves on public streets. Mattie built a model that predicted where and when human waste will show up, which could be used to better inform resource allocation for programs like San Francisco’s Pitstop (a program that brings portable bathrooms to areas that have high homeless populations).

Promoting Positive Climate Change Conversations via Twitter

Tim Martin

Tim's project explores the conversations about climate change that took place on Twitter in March 2017. With 1 million tweets from 560,000 users, Tim identified people belonging to different communities and used tools such as the Twitter API, Spark, NetworkX, and Gephi to derive insight from those conversations.

Pitch Recommendation (a look into the data science process)

Vicky Szuflita

Vicky created a model dedicated to recommending pitches to the Cubs in games against the Cardinals. (Technically, this model could help any team – or any talented pitcher quite frankly – when throwing pitches against Cardinals players, but this model is specifically dedicated to her beloved Cubs.)

Nobody Knows You're a Bot

Aaron Wilson
Data Scientist, Strata Decision Technology

Every week, the New Yorker magazine runs a caption contest. Aaron Wilson has entered this contest (unsuccessfully) dozens of times. "The problem is that I’m not very funny," he writes. "But computers? Computers are funny as hell. What if I could have one write captions for me?"

Remote Sensing: An Overview of Common Pixel Classification Techniques

Alando Ballantyne
Founder & Data Scientist, Sovereign Finance

Image analysis and classification is something that Alando is passionate about (specifically as it pertains to analyzing satellite imagery to generate economic data for emerging economies). In this post, he writes about a few of the more common pixel classification techniques used in remote sensing.

SnapLoc — Places of interest in a city, from geo-tagged photos

Kalgi Shah

Kalgi created SNAPLOC, a product that does automatic image classification and spatio temporal analysis in order to recommend the places of interest for traveling in a new city.

Forecasting Uber Demand in NYC

Ankur Vishwakarma

For his final project, Ankur decided to see if he could forecast hourly Uber demand across NYC neighborhoods. In addition to time-lagged features (such as previous week’s demand), he added information specific to each neighborhood to improve predictions.

Ads That Click

Maragatham KN

Classifying ads using CATBoost Model based on the features of the ads and the user’s behavior.

Exploring Craigslist Musicians Communities

Robert Hill
Data Analyst at Anchor Worldwide

Robert used Craigslist to reflect what different musician communities value in aggregate.

A Friendly Introduction to Recommender Systems

Oren Trevet
Data Scientist at Fellowship.AI

Recommender systems are an effective key solution to overcome information overload. Oren wrote an article exploring the motivation behind recommendation systems, as well as providing an overview of different characteristics and potentials of various prediction techniques.

Predicting Crime in SF- a toy WMD ​​Machine Learning 101: from Linear Regression To Deep Learning

Orlando Torres

Orlando started this project to show the potential ethical conflicts created by our new algorithms. In every conceivable field, algorithms are being used to filter people. In many cases, the algorithms are obscure, unchallenged, and self-perpetuating.

How to Simplify Your Holiday Festive Meal Planning

Heng-Ru May

Once again, the holiday season is upon us...Should you find yourself preparing the whole meal or offering to contribute a dish or two and in the mood for homemade culinary adventures, there’s a little web application, called the MenuPlannerHelper (abbreviated as MenuHelper) Heng-Ru May developed a while back that could come in handy.

Fighting Gerrymandering: Using Data Science To Draw Fairer Congressional Districts

Joseph Gambino

Politicians have used gerrymandering, the practice of drawing political districts for partisan advantage, to skew elections since the early days of this great country...Joseph's goal was to build a tool that would let anyone optimize a map on whatever they think most important.

Car Back! A Video-Based Car Detector For Cyclists

Rebekah Cunningham

Rebekah's vision is to be able to attach a camera to the back of her bike, near the seat which captures video in real time and alerts of any cars that are approaching from behind. The alert would be an audio cue that is played in one of the apps that is already running -- Strava, Spotify, or Audible as examples.

Mix Retriever: A Hip-Hop Playlist Generator

Zach Heick

To combine the functionality of individual song-based playlist generators with a focus on making content based recommendations, Zach created a web app that builds a hip-hop playlist of songs with similar lyrical meaning and mood around a song specified by the user.

More than a Million Pro-Repeal Net Neutrality Comments were Likely Faked

Jeff Kao

Jeff used natural language processing techniques to analyze net neutrality comments submitted to the FCC from April-October 2017, and the results were disturbing.

Targeting Disaster Relief from Space

Emily Miller

Emily used machine learning to better target disaster relief efforts, focusing on Typhoon Haiyan, which hit the Philippines in November of 2013.

Snow Prediction

James Cho

Using weather radar and terrain information to fill in gaps between ground snow sensors.

Ryan Lambert
Data Scientist at Gild makes it easier to parse through the giant open access database PubMed by allowing you to input anything from a news article clipping to an email thread.

The (Data) Science of Binge Watching on Netflix

Jamie Fradkin
Jr. Data Scientist at Buzzfeed

Who among us hasn’t fallen victim to the addictive power of a binge-worthy Netflix show? For Jamie's final project at Metis, she chose to explore elements in popular shows that might lead you to start “binge watching” on Netflix.

Halting the Spread of HIV

Emily Schuch
Data Scientist at Assembly Media

The HIV incidence rate is defined as the number of new HIV infections in a population in a given year. A rate of 0.4% means that 4 out of every 1000 people became newly infected with HIV.

End-to-End Loan Funding Predictor

Frederik Durant
Staff Member Data Innovation at Colruyt Group

Frederik delved deep in the micro-finance mechanics at Kiva, looking for a practical problem to solve.

Music Composition with LSTMs

Naoya Kanai

Naoya explores the intersection between data and art by designing a recurrent neural network utilizing Long Short-Term Memory nodes (LSTMs) to learn patterns in the Six Cello Suites by J.S. Bach and generate its own musical fragments.

Creating a Beer Recommendation Engine

Will Chernetsky

To combat flaws in other reccomendation systems, Will decided to use natural language processing of beer reviews to find similarity of language used to describe beers. He found the words people use to describe beer give better results than arbitrary scores or styles.

NBA Matchup Analysis

Yong Cho
Data Scientist at GrubHub

Yong built an analytics tool for Vantage sports.

Marijuana through the lens of the New York Times

Peter Rasmussen

The legality of and public’s view towards marijuana is rapidly changing as more states decriminalize and legalize the drug. As such, how have the words associated with marijuana in news articles changed over time?

Cracking Passwords with Neural Networks

Hasan Haq

You don't have to be an expert to know that password security is a big issue for companies these days. It seems every other week you hear of a well known website getting hacked. Hasan Haq's project uses neural networks to generate "dictionary" word lists to be used in password cracking.

Using convolutional neural networks to predict clothing

Brian Holligan

The Clothing Predictor is a web app that uses convolutional neural networks to identify images with one person in them, and then predict the clothing being worn by that person.

Personalizing Travel Recommendations with MemoTrek

Li Zhang

MemoTrek is an application that takes your travel photos as input and makes personalized recommendations for future travel destinations. It provides two types of recommendations: you-may-also-likes for a similar type of experience and something-different for new adventures.

Parking Lot Image Classification through OpenCV and a Flask App

Rohan Shah

As more data is sourced through satellite imagery it has become an important task to accurately identify important hotspots and targets within these images so as to classify them for practical use.

Applying Data Science to the Supreme Court

Emily Barry
Data Scientist at LegalServer

The Supreme Court is arguably the most important branch of government for guiding our future, but it's incredibly difficult for the average American to get a grasp of what's happening.

Basketball Player Tracker

Micheal Lai
Strategy Consultant & Data Scientist at IBM

Micheal created a system that can track players in a basketball clip and translate them to a coordinate grid. This kind of motion tracking already exists in the form of SportVU - but you can use the accessibility of YouTube clips to create player tracking.


Brian Kim
Data Scientist at FabFitFun

The app tracks twitter trends in volume, sentiment, and topicality for 2016 Election candidates. It was done using Flask, MongoDB, D3, Vader Sentiment Analysis, and Gensim on an EC2 Server.

Improving Brand Analytics with Image Logo Detection

Max Melnick

Using a convolutional neural net in TensorFlow, Max developed an application that can improve brand analytics through logo detection in images.

The Science of Singing Along

Garrett Hoffman
Senior Data Scientist at StockTwits

Garrett's project explores the "physics of pop culture", analyzing the culture that we love to consume every day with data science.

Visaurant: Reimagining the food search experience

Jeff Wen
Data Scientist at Tesla

Visaurant is a reimagination of the way users search through images that they are interested in. One prime use case for Visaurant is in sorting and filtering through food images (hence VIS -ual rest- AURANT).

Mapping Farmland from Satellite Imagery

Matt Maresca
Data Scientist at Annalect

The goal of Maresca's project was to perform semantic segmentation on satellite images in order to map out farmland around the city of Shanghai. He wanted to highlight a method that can be used to track farmland, urban development, and natural resources around the world in order to make better decisions for the future of our planet.

KenKen Solver

Ken Myers
Jr. Data Scientist at Uncommon Goods

Ken uses computer vision to solve KenKen puzzles. (Currently this application only accepts 4x4 KenKens). Simply upload a puzzle and get the solution.