Joyce Lee
Data Scientist at Clover Health
The opioid epidemic is one of the major public health catastrophes for this generation of Americans; similar to what tobacco/smoking or HIV/AIDS were to earlier generations, the opioid epidemic appears to be this era’s defining public health crisis. Lee set out to build a model to predict opioid-related mortality on a county by county basis with location-based insights and interventions in mind as a larger goal.
Aaron Frederick
Being a statistics-motivated sports fan, Frederick wanted to solve an atypical basketball problem: How can we optimize a typical basketball player’s career in the NBA? The question itself may seem open-ended, so in order to better scope this endeavor, he measured success by dollars earned.
Matt Maresca
Data Scientist at Annalect
The goal of Maresca's project was to perform semantic segmentation on satellite images in order to map out farmland around the city of Shanghai. He wanted to highlight a method that can be used to track farmland, urban development, and natural resources around the world in order to make better decisions for the future of our planet.
AJ Davis
With cannabis being legal for medicinal use in 31 states and recreational use in 9 states, there are thousands of dispensaries from which one can obtain pricing data to analyze. Davis thought it was a good time to dive into cannabis pricing to build a model that outputs a price benchmark for dispensaries (a “dankstimate” in the vein of Zillow’s “zestimate”).
Brenner Heintz
Imagine you’re hosting a birthday party. Everyone’s having a great time, music’s playing, and the party is noisy. Suddenly, it’s time for birthday cake! It’s too loud to use Alexa, and rather than hunting for your phone or a remote control, what if you could simply raise an open hand while in mid-conversation, your smart home device would recognize that gesture, and turn off the music? And with the same gesture, you could dim the lights just as the candles are lit?
Jiamin Han
Although uniqueness and personalization are great selling points, keeping up with the fashion trend is still the major theme that runs throughout the retail fashion business. The goal of this project is to find affordable alternatives to a designer outfit by using convolutional neural networks and other deep learning techniques.
David Dupuis
Data Scientist Researcher at Kwanko
There are some key elements to coding the game that can and probably should be memorized as they have other practical applications in computer science.
Xuan Qi
Machine Learning Engineer at Petuum, Inc.
Xuan Qi's goal for her project was to "accurately match customers with hotel inventory in this highly competitive market." On a personal note, she writes that "as a mom, when I book a hotel, I would like the hotel to be family friendly, closer to the sightseeing, and relatively quiet. But, my standards would be different booking a romantic weekend for me and my husband. We would like to pick the hotels with great food, closer to bars, and musical events are a plus. "
Vicky Szuflita
Vicky created a model dedicated to recommending pitches to the Cubs in games against the Cardinals. (Technically, this model could help any team – or any talented pitcher quite frankly – when throwing pitches against Cardinals players, but this model is specifically dedicated to her beloved Cubs.)
Vladimir Lazovskiy
Data scientist working at the intersection of machine learning, content creation, and media.
In this project, Vladimir tackles the question: what is the relationship between machine learning and optimization? He explores how delivery companies can use the power of machine learning to forecast travel times between two locations and use the genetic algorithm to find the best travel itinerary for each delivery truck.
Alex Smith
Alex chose to work with music data because it is a type of audio that can evoke emotion in addition to thought. When she listens to music, she asks herself, "Why does a particular song make me feel happy or sad?" The key of a song helps determine the feeling and is made up of the tonic note and the mode. For this project, she aimed to predict the mode.
Mattie Terzolo
In San Francisco, human waste is a growing issue, both for the people who run into it and for the people who have no other option than to relieve themselves on public streets. Mattie built a model that predicted where and when human waste will show up, which could be used to better inform resource allocation for programs like San Francisco’s Pitstop (a program that brings portable bathrooms to areas that have high homeless populations).
Alando Ballantyne
Founder & Data Scientist, Sovereign Finance
Image analysis and classification is something that Alando is passionate about (specifically as it pertains to analyzing satellite imagery to generate economic data for emerging economies). In this post, he writes about a few of the more common pixel classification techniques used in remote sensing.
Kalgi Shah
Kalgi created SNAPLOC, a product that does automatic image classification and spatio temporal analysis in order to recommend the places of interest for traveling in a new city.
Ankur Vishwakarma
For his final project, Ankur decided to see if he could forecast hourly Uber demand across NYC neighborhoods. In addition to time-lagged features (such as previous week’s demand), he added information specific to each neighborhood to improve predictions.
Maragatham KN
Classifying ads using CATBoost Model based on the features of the ads and the user’s behavior.
Robert Hill
Data Analyst at Anchor Worldwide
Robert used Craigslist to reflect what different musician communities value in aggregate.
Oren Trevet
Data Scientist at Fellowship.AI
Recommender systems are an effective key solution to overcome information overload. Oren wrote an article exploring the motivation behind recommendation systems, as well as providing an overview of different characteristics and potentials of various prediction techniques.
Orlando Torres
Orlando started this project to show the potential ethical conflicts created by our new algorithms. In every conceivable field, algorithms are being used to filter people. In many cases, the algorithms are obscure, unchallenged, and self-perpetuating.
Heng-Ru May
Once again, the holiday season is upon us...Should you find yourself preparing the whole meal or offering to contribute a dish or two and in the mood for homemade culinary adventures, there’s a little web application, called the MenuPlannerHelper (abbreviated as MenuHelper) Heng-Ru May developed a while back that could come in handy.
Joseph Gambino
Politicians have used gerrymandering, the practice of drawing political districts for partisan advantage, to skew elections since the early days of this great country...Joseph's goal was to build a tool that would let anyone optimize a map on whatever they think most important.
Rebekah Cunningham
Rebekah's vision is to be able to attach a camera to the back of her bike, near the seat which captures video in real time and alerts of any cars that are approaching from behind. The alert would be an audio cue that is played in one of the apps that is already running -- Strava, Spotify, or Audible as examples.
Zach Heick
To combine the functionality of individual song-based playlist generators with a focus on making content based recommendations, Zach created a web app that builds a hip-hop playlist of songs with similar lyrical meaning and mood around a song specified by the user.
Jeff Kao
Jeff used natural language processing techniques to analyze net neutrality comments submitted to the FCC from April-October 2017, and the results were disturbing.
James Cho
Using weather radar and terrain information to fill in gaps between ground snow sensors.
Andrew Wiegel
Ryan Lambert
Data Scientist at Gild
PUBmatch.co makes it easier to parse through the giant open access database PubMed by allowing you to input anything from a news article clipping to an email thread.
Galen Ballew
Using LinearSVC and Histogram of Oriented Gradients to detect and track vehicles.
Lauren Shareshian
Lauren used Zillow metadata, natural language processing on realtor descriptions, and a convolutional neural net on home images to predict Portland home sale prices.
Daniel Licht
Antonia Antonova
This project aims to build a deep learning pipeline that takes text descriptions and generates unique video depictions of the content described.
Matt Murray
Emily Miller
Emily used machine learning to better target disaster relief efforts, focusing on Typhoon Haiyan, which hit the Philippines in November of 2013.
Phillip Tan
Josh Peng
Motorcycle Lean Assist uses a convolutional neural network to detect the lean angle of a motorcycle through image classification, providing you with rider feedback on your current lean angle so you don’t have to guess.
Tim Martin
Tim's project explores the conversations about climate change that took place on Twitter in March 2017. With 1 million tweets from 560,000 users, Tim identified people belonging to different communities and used tools such as the Twitter API, Spark, NetworkX, and Gephi to derive insight from those conversations.
Max Melnick
Using a convolutional neural net in TensorFlow, Max developed an application that can improve brand analytics through logo detection in images.
Naoya Kanai
Naoya explores the intersection between data and art by designing a recurrent neural network utilizing Long Short-Term Memory nodes (LSTMs) to learn patterns in the Six Cello Suites by J.S. Bach and generate its own musical fragments.
Will Chernetsky
To combat flaws in other reccomendation systems, Will decided to use natural language processing of beer reviews to find similarity of language used to describe beers. He found the words people use to describe beer give better results than arbitrary scores or styles.
Peter Rasmussen
The legality of and public’s view towards marijuana is rapidly changing as more states decriminalize and legalize the drug. As such, how have the words associated with marijuana in news articles changed over time?
Justin Chien
Using Flask, Justin created a web app to combine his passion for photos and cars. Motordex uses the decision tree process and different models to identify cars based on a submitted picture.
Hasan Haq
You don't have to be an expert to know that password security is a big issue for companies these days. It seems every other week you hear of a well known website getting hacked. Hasan Haq's project uses neural networks to generate "dictionary" word lists to be used in password cracking.
Katherine Pully
MedTracker i s a system to track your (psychiatric) medications and your moods and to compare what is working well for other users. Users can register as a patient or a doctor.
Brian Holligan
The Clothing Predictor is a web app that uses convolutional neural networks to identify images with one person in them, and then predict the clothing being worn by that person.
Li Zhang
MemoTrek is an application that takes your travel photos as input and makes personalized recommendations for future travel destinations. It provides two types of recommendations: you-may-also-likes for a similar type of experience and something-different for new adventures.
Rohan Shah
As more data is sourced through satellite imagery it has become an important task to accurately identify important hotspots and targets within these images so as to classify them for practical use.
Emily Barry
Data Scientist at LegalServer
The Supreme Court is arguably the most important branch of government for guiding our future, but it's incredibly difficult for the average American to get a grasp of what's happening.
Micheal Lai
Strategy Consultant & Data Scientist at IBM
Micheal created a system that can track players in a basketball clip and translate them to a coordinate grid. This kind of motion tracking already exists in the form of SportVU - but you can use the accessibility of YouTube clips to create player tracking.
Emily Schuch
Data Scientist at Assembly Media
The HIV incidence rate is defined as the number of new HIV infections in a population in a given year. A rate of 0.4% means that 4 out of every 1000 people became newly infected with HIV.
Brian Kim
Data Scientist at FabFitFun
The app tracks twitter trends in volume, sentiment, and topicality for 2016 Election candidates. It was done using Flask, MongoDB, D3, Vader Sentiment Analysis, and Gensim on an EC2 Server.
Rui Chang
Lead Data Scientist at Target
This project is trying to estimate house prices based on the features using publicly available data, and build a web application for house prices estimation.
Ken Myers
Jr. Data Scientist at Uncommon Goods
Ken uses computer vision to solve KenKen puzzles. (Currently this application only accepts 4x4 KenKens). Simply upload a puzzle and get the solution.
Jeff Wen
Data Scientist at Tesla
Visaurant is a reimagination of the way users search through images that they are interested in. One prime use case for Visaurant is in sorting and filtering through food images (hence VIS -ual rest- AURANT).
Jamie Fradkin
Jr. Data Scientist at Buzzfeed
Who among us hasn’t fallen victim to the addictive power of a binge-worthy Netflix show? For Jamie's final project at Metis, she chose to explore elements in popular shows that might lead you to start “binge watching” on Netflix.
Ash Chakraborty
Data Scientist at Credit Sesame
At a recent DataKind SF event, Ash was rather intrigued by the challenges faced in investigating wage theft and other labor violations not just throughout the nation, but also specific to California and the Bay Area regions.
Frederik Durant
Staff Member Data Innovation at Colruyt Group
Frederik delved deep in the micro-finance mechanics at Kiva, looking for a practical problem to solve.
Yong Cho
Data Scientist at GrubHub
Yong built an analytics tool for Vantage sports.
Henri Dwyer
Data Scientist at Dataiku
Henri is currently working on classifying chords from audio using neural networks.
Garrett Hoffman
Senior Data Scientist at StockTwits
Garrett's project explores the "physics of pop culture", analyzing the culture that we love to consume every day with data science.