Photo by Campaign Creators via Unsplash
This post was written by Kerstin Frailey, Sr. Data Scientist on the Corporate Training team at Metis.
Good data science does not imply good business. Certainly, good data science can lead to good business, but there’s no guarantee that even the best performing machine learning algorithm will lead to any uptick in revenue, customer satisfaction, or board member approval.
How can this be? After all, data science teams are full of smart, well-compensated individuals driven by curiosity and empowered by technology. How could they not move the bottom line?
In general, the output of a data science project is not, itself, a driver of impact. The output informs some decision or interacts with some system that drives impact. Clustering customers by behavior won’t improve sales on its own, but creating product bundles for those clusters might. Predicting late deliveries won’t improve customer satisfaction, but sending a push notification warning customers of the potential issue might. Unless your product actually is data science, there’s almost always a step that must connect the output of data science to the impact we want it to drive.
The problem is that we often take that step for granted. We assume that if the data science project is successful then the impact will follow. We see this assumption hiding in the most conspicuous places: in OKRs that measure new users and not algorithm performance, on dashboards that display revenue but not precision, in the single and unchallenged sentence on a planning document that states how a project will change the business.
Too often this how step is assumed to be feasible, reasonable, and without risk. But, in reality, the how is a guess. It’s a hope. It’s a hypothesis – one that we call the impact hypothesis.
The impact hypothesis is the idea that connects the output of the data science project and the impact on the business. It is the how upon which the transformation of your business hinges.
An illustrative example
Let’s consider a common data science project: predicting customer churn. The first line of the planning document states the goal as “to predict customer churn in order to reduce the number of churned customers through targeted incentives and promotions.”
The data science goal is to “predict customer churn.” The document details potential solution paths, technological overhead, holdout group selection, features to engineer, relevant subject matter experts, and on and on.
The desired business impact is “to reduce the number of churned customers.” The document and soon-to-be-built dashboard define the exact metric by which to calculate churned customers and the cadence at which it is measured.
The assumption of how impact will occur is “through direct incentives and promotions.” It’s unlikely that anywhere in the document one more sentence discusses how direct incentives and promotions will do this. It’s simply assumed that it will happen.
The Dangers of An Unchecked Assumption
We asked before, “how can a successful data science project not be an impactful one?”
By assuming that it will be.
But, if that assumption fails, the entire project will be for naught. It will mean wasted time and resources. When a data science project succeeds but the impact hypothesis fails, it can be devastating to the moral of the data team. If the data team is centralized, they’ll be reluctant to work with your team in the future. If the data science team is embedded, they’ll feel underappreciated and unmotivated. But all this can be avoided by identifying and challenging your impact hypothesis early.
That assumption fails all too often--and almost always because it was never fully vetted. Instead of making an assumption, we need to recognize that the how is a hypothesis.
State the Impact Hypothesis
First, we must explicitly state the hypothesis. In terms of our example, the impact hypothesis is “Targeting customers who would otherwise churn with direct incentives and promotions will reduce the number who ultimately churn.”
After seeing it written out, we might realize the hypothesis lacks specificity around implementation. A more precise hypothesis, like ” Targeting online customers who would otherwise churn with direct email incentives and discounted promotions will reduce the number who ultimately churn,” will help us formulate an impact plan and direct future action.
Stating the hypothesis refines the idea and cements its details. It also invites the critical eye so badly needed and so rarely afforded. Furthermore, it removes the presumption of correctness. In doing so we invite the healthy critique we hope to generate. As with any hypothesis, our goal during critique is to identify when and how it can fail.
Vet the Impact Hypothesis
Now that we’ve dismissed the assumption, let’s critique the hypothesis.
How might the example’s impact hypothesis fail?
- If we’ve saturated our customer base with promotions to the point where additional incentives have no impact.
- If we run out of budget and cannot incentivize customers.
- If customers are not leaving due to a cost issue.
- If customers are churning as an expression of protest.
- If customers no longer have a use for the product.
And countless other ways.
The point of recognizing the impact hypothesis isn’t to find an unflappable one, but to identify and plan for ways yours might fail. Every hypothesis will have points of potential failure (and if you can’t find them, you’re not trying hard enough).
Document and Communicate Your Findings
After identifying and vetting the hypothesis, document your findings. The nontechnical planning and scoping should be included in the larger project’s documentation. The results of it should be shared with the data science team and all stakeholders. Doing so will enable the data science team to narrow their solution paths to ones that fit your impact plan. It will also help nontechnical team members ensure they don’t create barriers to your planned impact. Documenting and communicating your findings will protect the project’s impact during and after the project is complete.
Respond to Critical Failure
Some hypotheses will fail altogether under scrutiny. When this occurs, discard the project. Even if the data science project was exciting, the team should move on to a project that has a more sound impact hypothesis. If you want to avoid sunk costs and broken hearts, you should vet the impact hypothesis before the project ever starts.
The details of how data science will drive impact are so often left to be figured out at some point in the future, when the machine learning algorithm is humming along and (hopefully) hitting its numbers. It’s assumed that stakeholders will be able to take the data team’s output turn it into impact. Yet we know that if this assumption fails it is impossible for the data science project to be impactful – regardless of its precision, recall, or any other performance metric.
Here we’ve outlined a process to critically consider the how. By identifying, vetting, and communicating the impact hypothesis we treat the how as important as the data science and the impact it connects. With a strong impact hypothesis the data science output connects directly to the impact. Without one, a project falls apart--not quickly, but only after the data science is done and is ready to become a sunk cost.
The impact hypothesis is the keystone of applied data science; it’s the idea that binds together the output and the impact. A strong impact hypothesis is the difference between data science for its own sake and data science that transforms your business.
Metis provides corporate training on all aspects of data science in business – from technical upskilling to data science management and more. Learn more about Metis Corporate Training here.