Photo by Kelly Sikkema via Unsplash
This post was written by Damien Martin, Sr. Data Scientist on the Corporate Training team at Metis.
In a previous article, we discussed the benefits of up-skilling your employees so they could investigate trends within data to help find high-impact projects. If you implement these suggestions, you will have everyone thinking about business problems at a strategic level, and you will be able to add value based on insight from each person’s specific job function. Having a data literate and empowered workforce allows the data science team to work on projects rather than ad hoc analyses.
Once we have identified an opportunity (or a problem) where we think that data science could help, it is time to scope out our data science project.
The first step in project planning should come from business concerns. This step can typically be broken down into the following subquestions:
There is nothing in this evaluation process that is specific to data science. The same questions could be asked about adding a new feature to your website, changing the opening hours of your store, or changing the logo for your company.
The owner for this stage is the stakeholder, not the data science team. We are not telling the data scientists how to accomplish their goal, but we are telling them what the goal is.
Just because a project involves data doesn't make it a data science project. Consider a company that wants a dashboard that tracks a key metric, such as weekly revenue. Using our previous rubric, we have:
Even though we may use a data scientist (particularly in small companies without dedicated analysts) to write this dashboard, this isn't really a data science project. This is the sort of project that can be managed like a typical software engineering project. The goals are well-defined, and there isn't a lot of uncertainty. Our data scientist just needs to write the queries, and there is a "correct" answer to check against. The value of the project isn't the amount we expect to spend, but the amount we are willing to spend on creating the dashboard. If we have sales data sitting in a database already, and a license for dashboarding software, this might be an afternoon's work. If we need to build the infrastructure from scratch, then that would be included in the cost for this project (or, at least amortized over projects that share the same resource).
One way of thinking about the difference between a software engineering project and a data science project is that features in a software project are often scoped out separately by a project manager (perhaps in conjunction with user stories). For a data science project, determining the "features" to be added is a part of the project.
A data science problem might have a well-defined problem (e.g. too much churn), but the solution might have unknown effectiveness. While the project goal might be "reduce churn by 20 percent", we don't know if this goal is achievable with the information we have.
Adding additional data to your project is typically expensive (either building infrastructure for internal sources, or subscriptions to external data sources). That's why it is so crucial to set an upfront value to your project. A lot of time can be spent generating models and failing to reach the targets before realizing that there is not enough signal in the data. By keeping track of model progress through different iterations and ongoing costs, we are better able to project if we need to add additional data sources (and price them appropriately) to hit the desired performance goals.
Many of the data science projects that you try to implement will fail, but you want to fail quickly (and cheaply), saving resources for projects that show promise. A data science project that fails to meet its target after 2 weeks of investment is part of the cost of doing exploratory data work. A data science project that fails to meet its target after 2 years of investment, on the other hand, is a failure that could probably be avoided.
When scoping, you want to bring the business problem to the data scientists and work with them to make a well-posed problem. For example, you may not have access to the data you need for your proposed measurement of whether the project succeeded, but your data scientists could give you a different metric that might serve as a proxy. Another element to consider is whether your hypothesis has been clearly stated (and you can read a great post on that topic from Metis Sr. Data Scientist Kerstin Frailey here).
Here are some high-level areas to consider when scoping a data science project:
While the bulk of the cost for a data science project involves the initial set up, there are also recurring costs to consider. Some of these costs are obvious because they are explicitly billed. If you require the use of an external service or need to rent a server, you receive a monthly bill for that ongoing cost.
But in addition to these explicit costs, you should consider the following:
The expected maintenance costs (both in terms of data scientist time and external subscriptions) should be estimated up front.
When scoping a data science project, there are several steps, and each of them have a different owner. The evaluation stage is owned by the business team, as they set the goals for the project. This involves a careful evaluation of the value of the project, both as an upfront cost and the ongoing maintenance.
Once a project is deemed worth pursuing, the data science team works on it iteratively. The data used, and progress against the main metric, should be tracked and compared to the initial value assigned to the project.
Metis offers training to upskill your technical team in data science and machine learning, as well as trainings to help executives become more fluent in understanding data and its value. If you have managers who would like training on managing data science projects, please get in touch with our corporate training team, or fill out this form.
By Tony Yiu • November 02, 2020
By Metis • September 03, 2020
By Metis • August 13, 2020