Free FIU Data Science One Hour at Bootcamp: Intro Naive Bayes workshop -  Register Here

The Impact Hypothesis: The Keystone to Transformative Data Science

By Carlos Russo • March 22, 2019

Photo by Campaign Creators via Unsplash

Good data science does not imply good business. Certainly, good data science can lead to good business, but there’s no guarantee that even the best performing machine learning algorithm will lead to any uptick in revenue, customer satisfaction, or board member approval.

How can this be?  After all, data science teams are full of smart, well-compensated individuals driven by curiosity and empowered by technology.  How could they not move the bottom line?

In general, the output of a data science project is not, itself, a driver of impact. The output informs some decision or interacts with some system that drives impact. Clustering customers by behavior won’t improve sales on its own, but creating product bundles for those clusters might. Predicting late deliveries won’t improve customer satisfaction, but sending a push notification warning customers of the potential issue might. Unless your product actually is data science, there’s almost always a step that must connect the output of data science to the impact we want it to drive.

The problem is that we often take that step for granted. We assume that if the data science project is successful then the impact will follow. We see this assumption hiding in the most conspicuous places: in OKRs that measure new users and not algorithm performance, on dashboards that display revenue but not precision, in the single and unchallenged sentence on a planning document that states how a project will change the business.

Too often this how step is assumed to be feasible, reasonable, and without risk. But, in reality, the how is a guess. It’s a hope. It’s a hypothesis – one that we call the impact hypothesis.

The impact hypothesis is the idea that connects the output of the data science project and the impact on the business. It is the how upon which the transformation of your business hinges.

An illustrative example

Let’s consider a common data science project: predicting customer churn.  The first line of the planning document states the goal as “to predict customer churn in order to reduce the number of churned customers through targeted incentives and promotions.”

The data science goal is to “predict customer churn.” The document details potential solution paths, technological overhead, holdout group selection, features to engineer, relevant subject matter experts, and on and on.

The desired business impact is “to reduce the number of churned customers.” The document and soon-to-be-built dashboard define the exact metric by which to calculate churned customers and the cadence at which it is measured.

The assumption of how impact will occur is “through direct incentives and promotions.” It’s unlikely that anywhere in the document one more sentence discusses how direct incentives and promotions will do this.  It’s simply assumed that it will happen.

The Dangers of An Unchecked Assumption

We asked before, “how can a successful data science project not be an impactful one?”

By assuming that it will be.

But, if that assumption fails, the entire project will be for naught.  It will mean wasted time and resources. When a data science project succeeds but the impact hypothesis fails, it can be devastating to the moral of the data team.  If the data team is centralized, they’ll be reluctant to work with your team in the future. If the data science team is embedded, they’ll feel underappreciated and unmotivated.  But all this can be avoided by identifying and challenging your impact hypothesis early.

That assumption fails all too often--and almost always because it was never fully vetted. Instead of making an assumption, we need to recognize that the how is a hypothesis.

The Process

State the Impact Hypothesis

First, we must explicitly state the hypothesis. In terms of our example, the impact hypothesis is “Targeting customers who would otherwise  churn with direct incentives and promotions will reduce the number who ultimately churn.”

After seeing it written out, we might realize the hypothesis lacks specificity around implementation. A more precise hypothesis, like ” Targeting online customers who would otherwise churn with direct email incentives and discounted promotions will reduce the number who ultimately churn,” will help us formulate an impact plan and direct future action.

Stating the hypothesis refines the idea and cements its details. It also invites the critical eye so badly needed and so rarely afforded.   Furthermore, it removes the presumption of correctness. In doing so we invite the healthy critique we hope to generate. As with any hypothesis, our goal during critique is to identify when and how it can fail.

Vet the Impact Hypothesis

Now that we’ve dismissed the assumption, let’s critique the hypothesis.

How might the example’s impact hypothesis fail?  

  1. If we’ve saturated our customer base with promotions to the point where additional incentives have no impact.  
  2. If we run out of budget and cannot incentivize customers.  
  3. If customers are not leaving due to a cost issue.
  4. If customers are churning as an expression of protest.  
  5. If customers no longer have a use for the product.  

And countless other ways.

The point of recognizing the impact hypothesis isn’t to find an unflappable one, but to identify and plan for ways yours might fail.  Every hypothesis will have points of potential failure (and if you can’t find them, you’re not trying hard enough).

Document and Communicate Your Findings

After identifying and vetting the hypothesis, document your findings. The nontechnical planning and scoping should be included in the larger project’s documentation.  The results of it should be shared with the data science team and all stakeholders. Doing so will enable the data science team to narrow their solution paths to ones that fit your impact plan.  It will also help nontechnical team members ensure they don’t create barriers to your planned impact. Documenting and communicating your findings will protect the project’s impact during and after the project is complete.

Respond to Critical Failure

Some hypotheses will fail altogether under scrutiny.  When this occurs, discard the project. Even if the data science project was exciting, the team should move on to a project that has a more sound impact hypothesis.  If you want to avoid sunk costs and broken hearts, you should vet the impact hypothesis before the project ever starts.

Moving Forward

The details of how data science will drive impact are so often left to be figured out at some point in the future, when the machine learning algorithm is humming along and (hopefully) hitting its numbers.  It’s assumed that stakeholders will be able to take the data team’s output turn it into impact. Yet we know that if this assumption fails it is impossible for the data science project to be impactful – regardless of its precision, recall, or any other performance metric.

Here we’ve outlined a process to critically consider the how. By identifying, vetting, and communicating the impact hypothesis we treat the how as important as the data science and the impact it connects. With a strong impact hypothesis the data science output connects directly to the impact.  Without one, a project falls apart--not quickly, but only after the data science is done and is ready to become a sunk cost.

The impact hypothesis is the keystone of applied data science; it’s the idea that binds together the output and the impact.  A strong impact hypothesis is the difference between data science for its own sake and data science that transforms your business.


Metis provides corporate training on all aspects of data science in business – from technical upskilling to data science management and more. Learn more about Metis Corporate Training here.

Similar Posts

business resource
VIDEO: Recorded Talk - How Machine Learning is Changing Finance with Javed Ahmed

By Carlos Russo • August 20, 2020

Watch a recording of Metis Sr. Data Scientist Javed Ahmed's talk on How Machine Learning is Changing Finance at the new Wake Forest University Financial Services and Fintech Hub.

business resource
On-Demand Recording: How To Empower Your Business With Data Literacy

By Carlos Russo • August 17, 2020

In late July, our Chief Data Scientist Debbie Berebichez hosted a free Training Industry webinar on How to Empower Your Business with Data Literacy. We're happy to share that the recording of the talk is now available on-demand.

business resource
VIDEO: An AI4 Panel Discussion on The State of AI in Banking

By Carlos Russo • September 23, 2020

Metis Sr. Data Scientist Javed Ahmed recently took part in a panel discussion about The State of AI in Banking during an online Ai4 event. He and the other panelists talked about upskilling, challenges related to COVID-19, and more. Watch the recorded panel discussion here.