Ben Wellington is perhaps best known for his blog called I Quant NY, where he tells the story of New York City through the data the city releases.
He is a Visiting Assistant Professor at The Pratt Institute in Brooklyn, where he teaches a course on statistics to future Urban Planners. He is also a Quantitative Analyst at Two Sigma and teaches job training and team building workshops with Cherub Improv, a non-profit that uses improv comedy for social good.
Paul Buffa is a Teaching Assistant at Metis and Director of Business Analytics & Strategy for Data Science at Kaplan Test Prep. In anticipation of Ben's visit to speak to the Metis Data Science Bootcamp as part of our Speaker & Event series, Paul asked Ben about his work, how he selects data to analyze, and his dream data set.
Paul: It may not be the piece that got you the most buzz, but which of your analyses is your favorite?
Ben: I was really amazed to see in the data that half of New York City cabs were charging different default tips than the other half. This pointed to a real issue for taxi drivers, where half of the drivers were making $200 more a year than the other half, something that is hardly equitable. It also makes the purchase decision confusing for riders, who should not have to check which vendor is being used to figure out if they are tipping on tolls and taxes or not.
Paul: You talk a lot about what type of data is out there for analysis, and note that NYC is better than others when it comes to releasing lots of great data (though still has lots of room for improvement). That said, what's the dream NYC dataset you would love to get your hands on (and what would you do with it)?
Ben: I am really excited for the NYPD fire call dataset. It was due out at the end of last year. I want to build a model to predict fires, crossing the fire data with other city data sets to see what is predictive.
Paul: You've covered a lot of different topics on your site. What's the process you go through for determining whether something that catches your interest is worth the time and effort for a full-fledged analysis?
Ben: When I get a data set, I look at each column and ask myself, "Is there something I would want to know about this column?" Sometimes it's the max, sometimes it's the most, sometimes it's the minimum, or sometimes it's a correlation. I don't really know in advance, but asking questions column by column can often lead to new insights. Parking tickets are great examples of this. There are so many diverse questions one could ask with columns like Year, Make, Model, Location, Time, Violation Type, etc.
Paul: What's next up for I Quant NY?
Ben: I take it day by day, so I never really know what's next. Sometimes I see a tweet that makes me think, "Hmm, can I quantify that?" Sometimes I hear about an upcoming hearing. Or sometimes I see a new data set. I guess you'll have to come visit the blog to see what's next.