MONDAY: Final Application Deadline for Spring Bootcamp! Apply Now

Paul Trowbridge on the Importance of Having a Solid Stats Foundation

By Emily Wilson • January 05, 2018

Paul Trowbridge is the instructor of our upcoming Live Online Statistical Foundations for Data Science & Machine Learning part-time professional development course, which will run from January 22nd - February 28th on Monday and Wednesday evenings from 6:30 - 9:30 pm EST. View full course details and enroll here

Paul Trowbridge believes in the importance of knowing how things work. What makes them go? What's going on underneath the exterior? And why should you care to find out?

After receiving advanced training in statistics, demography, and sociology from the University of Washington and Rutgers University, Trowbridge has worked in applied fields such as fMRI, epidemiology and public health, international relations, urban planning, and micro-simulation modeling. He's taught statistics, data science, and data visualization at New York University's School of Professional Studies and is all set to teach our upcoming Statistics Foundations course. 

In a recent Q&A, he discussed his career, why he thinks it's so important to have a statistical foundation when approaching data science work, and much more. 

The intent of your upcoming Statistical Foundations course is to expose students to common statistical issues and teach them how to avoid statistical fallacies. What about the course makes you most excited to teach it?

Introducing students to first principals underlying common data science methods. By teaching the underlying material concerning probability, estimation, and hypothesis testing, students gain a deeper knowledge of the principals involved in their work as data scientists and consequently can bring a deeper insight to applied data science problems. I am always excited to see students understand and become confident producing their own solutions via first principals, as opposed to simply quoting output they may not fully understand.

Your course will be taught Live Online. What benefits do you think there are to that format?

The live online format allows students to take the course from remote and distributed locations. This is a big convenience factor for students. The format is a live, synchronous, format allowing for dynamic interaction and feedback from the instructor and allows for student questions and answers. The live format also facilitates student collaboration and peer-to-peer engagement.

By the end of the course, students will understand many of the principles underlying machine learning and data science. How important do you think it is for data scientists to understand these inner workings?

It is very important for two reasons. Modern computational facilities release researchers and practitioners from having to rely on methods that impose undue restrictions or impose assumptions too strong or inappropriate to a given problem at hand. Furthermore, when investigating a novel problem, because the problem is novel, there may not be existing solutions. In this case, practitioners will need to develop their own solutions, from first principals. In both cases, implementing custom-tailored statistical methodology requires knowledge of the underlying principals. The course teaches these underlying principals. Moreover, even in cases where practitioners aren't developing novel methodology, the deeper understanding of the inner workings of the applied methods allows for deeper insight into data science problems and allows for richer conclusions to be drawn and deeper insights culled. It is important in terms of gaining deeper insight into data science problems as well as being able to custom-tailor methodologies to unique problems.

You've personally worked in fields spanning fMRI, epidemiology and public health, international relations, urban planning, and micro-simulation modeling. If you are at liberty to share, what are some projects you've worked on recently that you're particularly proud of?

When working on the fMRI project, I introduced random effects models to capture subject-specific dependencies in a repeated measure experimental context. Also, I introduced multivariate visualization techniques such as multi-dimensional scaling and principals components to visualize relationships in high-dimensional data sets.

Whose work inspires your own?

The faculty at the University of Washington, and particularly the faculty involved with the Center for Statistics and the Social Sciences definitely established my basic approach to applied data analysis projects and influence my approach to statistical analysis broadly. Additionally, many projects in contemporary data visualization and information design engage problems in data science. Fernanda Viegas, Ben Fry, Jen Lowe, the Onformative studio, and Periscopic all produce thought-provoking visual data engagements that have strong ties to contemporary data science problems.

How do you stay up-to-date in a quickly evolving field?

Read. Definitely read as much as I can. Attend lots of presentations and see what people are currently working on. Just staying active and connected in the field one is constantly introduced to new and exciting work being done. Service opportunities in the field are also excellent opportunities to see, engage and keep up-to-date on the field.


Enroll here to learn from Paul. Want to try out the Live Online format before enrolling? Register for a free sample class 1/9. And take note – special New Year's pricing expires 1/12. Get $350 off the course price if you enroll by then! 

Similar Posts

data science
Our Top 10 Most-Read Blog Posts of 2019

By Emily Wilson • December 20, 2019

Throughout the year, we post blog content including alumni stories, data science insights from our Sr. Data Scientists, guest posts, and much more. Check out our top 10 most-read posts of 2019. We hope you enjoy them again or for the first time, and we look forward to producing much more data science content in 2020.

data science
How to Gather Data from YouTube

By Kimberly Fessel • November 11, 2019

In this post, learn how you can gain access to three types of YouTube data: the videos themselves for use in computer vision tasks, the video transcripts for natural language processing (NLP), and video search results for hybrid machine learning efforts.

data science
Course Report Guest Post: Beginner's Guide to Using Pandas for Python

By Metis • November 07, 2019

Last week, Metis Sr. Data Scientist Joe Eddy published an article on Course Report titled Pandas in Python: A Guide for Beginners. In it, he explains how Pandas was developed, how it's used by data scientists and within companies worldwide, and how beginners can start learning some basics on their own.