When our Sr. Data Scientists aren't teaching the intensive, 12-week bootcamps or corporate training courses, they're working on a variety of other projects. This monthly blog series tracks and discusses some of their recent activities and accomplishments.
Cliff Clive, Metis Sr. Data Scientist (Bootcamp)
By launching his new website, Data Science MVP,Metis Sr. Data Scientist Cliff Clive is on a mission to help promote better engineering practices among new data scientists. In his first post, he explains that the site's title stands for Minimum Viable Product, which is "a first draft of a data science project in which we've put together enough of our workflow to read in some data, put it into a workable format for our tools to handle, train a basic model, and calculate some preliminary results. The results can be absolute garbage, and that's okay. An MVP is an engineering effort, meant to provide us with a pipeline to quickly develop new iterations of our work, and to produce a baseline model that we can use to benchmark our more serious findings."
In his most recent post, OSEMN is Awesome, but AOSEMN is Awesomer, he covers the importance of spending time carefully designing a data science project before diving into the data. OSEMN stands for Obtain, Scrub, Explore, Model, Interpret, and it's a widely adopted framework for data science.
"Workflows are effective because they provide direction that keeps us moving through each stage of a project," writes Clive. "When we adopt them, we minimize the time spent wondering about the next thing we should do." Check out the post in full to why design is crucial to any effective workflow, how to write a good abstract, and how to build and iterate.
(Additionally, you can also find a project template associated with his blog on GitHub.)
Adam Wearne, Metis Sr. Data Scientist (Bootcamp)
In his first post on Medium, Metis Sr. Data Scientist Adam Wearne provides a comprehensive Intro to PyTorch with NLP.
"When it comes to options for deep learning within the Python ecosystem, there are TONS of choices," he writes. "Keras is a great choice for starting out and for quickly developing and iterating on models, pure Tensorflow is amazingly fast, and with the recent advent of Tensorflow 2.0, will only become more awesome. However, over the past few years, there has been a huge surge in popularity for Pytorch...I’d like to introduce some of the main Pytorch concepts, and apply them to a common task in natural language processing: Named Entity Recognition (NER)."
Want more? There's plenty of it. Read the full post here.
Damien Martin, Metis Sr. Data Scientist (Corporate Training)
In a previous blog post, Metis Sr. Data Scientist Damien Martin discussed the benefits of business leaders upskilling their employees in order to investigate trends within data, thus helping to find high-impact projects. When everyone on your team is thinking about business problems at a strategic level, all will be able to add value based on insight from each person’s specific job function. In turn, having a data literate and empowered workforce allows your data science team to work on projects rather than ad hoc analyses.
In this followup post, Martin breaks down the process of Scoping a Data Scientist Project, which is what happens after someone on your team has identified an opportunity (or a problem) through which data science can likely help.
Read Damien's post here.
What were our Sr. Data Scientists up to last month? Find out here.