Earlier this week, we wrote about current Metis student Jeff Kao's viral blog post, which included results from his ongoing bootcamp final project. In the now-viral post, he wrote of exploring and analyzing millions of comments – supposedly both for and against – the proposed repeal of Net Neutrality submitted to the FCC between April and October of 2017. He found that at least 1.3 million of them were likely faked.
After the post took off online, earning tens of thousands of likes, shares, and being cited in Fortune, The Washington Post, Engadget, Quartz, The Stranger, and other publications, we caught up with Kao to ask what the experience of going viral has been like, why he chose to focus on the topic of net neutrality, and what advice he might have for other aspiring data scientists like himself.
Tell us a little about your background.
I guess you could say my background is pretty eclectic. I have a law degree and an engineering degree. I've worked at a couple of startups and a large law firm. In college, I was also lucky to have had internships in all sorts of fields where my technical skills were useful, e.g., patent law, biomedical research, the retail side at a major bank, aerospace, and at a traditional "mature" tech company.
What prompted you to choose this topic – public comments to the FCC on net neutrality – for your research?
I was a summer law clerk at FCC Commissioner Mignon Clyburn's office and a net neutrality proponent even before law school (aside: yes, I do have a personal point of view in all of this--my code and data should, of course, be reviewed with that context in mind).
I also saw a lot of data science research that was really well done, but which only looked at specific snapshots in time, as they were done by folks who had spent some of their free time doing this. This all hinted to me that there might be more to be found in the data. The only other study at the time that had looked at the full set of submitted comments as a comprehensive whole had been done by an industry-sponsored consultancy in D.C. I also wanted to practice my NLP (natural language processing) skills and do a text-only project, and one that involved a fairly substantial amount of data. Ever since I started seriously thinking about data science, I've been writing scattered notes to myself about projects that I'd like to dig deeper on. This project seemed the best one to do out of that list, at the time.
What was your reaction when you saw your blog post going viral?
To be honest, I could not believe that it was happening. I wasn't even thinking about taking the time to make a blog post until after my final project was complete (mid-December), but my classmate Rebekah emailed me about New York Attorney General Schneiderman's investigation. That's when I realized that folks with the power to elicit more transparency from the FCC Chair's office were also looking into the irregularities lurking just beneath the surface of the data. I made the post and hoped that someone in the AG's office would see it, or that I could pass it along through someone I knew working there. I had no idea the post would blow up like this.
I should also add that the practice my other Metis projects gave me really helped me with the blog post. I went into the program thinking that I was going apply data science, machine learning, and deep learning to interesting problems. At Metis, I learned that communication was an equally important part of the work. I made visualizations that most clearly communicated the situation to a non-technical audience, got my wife to edit the words that I wrote (as a former lyricist/English teacher, she's the writer in the family), and posted something as soon as I could.
What brought you to Metis?
I was seeing a huge increase in the use of data science and machine learning techniques – a lot of stuff that I had learned about and played with in college, but it had only seen few applications at the time. There weren't a lot of libraries, communities, or a standardized methodology built up around it.
I tinkered around with a few online courses, loved doing the work, and decided I needed to jump in ASAP. I had been a few years out of school, and doing a master's immediately didn't feel right for various reasons. Metis seemed like great way to refocus my career and do something that was really interesting, so I took a leave from my job and started.
Do you have any advice for other aspiring data scientists?
As a data scientist who's just starting out in the field, I would say that I don't have enough data points to say, but I am excited to collect a few more data points by continuing in the field! From a personal level, I do feel like I've been lucky enough to find projects I cared about and to which I can bring my full skillset, experience, and passion, and work as a complete person. That is important to me.
See other projects made by Metis students.