Photo by Rawpixel via Unsplash
This post was written by Damien Martin, Sr. Data Scientist on the Corporate Training team at Metis.
The head of product may wonder whether a recent cold snap led to an increase in traffic to the company website because people were trapped inside. The CEO might look at engagement statistics and wonder if mobile user engagement varies significantly with the make and model of the phone.
These ideas get forwarded along to the data team, with the suggestion to investigate, and maybe put that expensive visualization software to work to create a dashboard to track metrics. At larger companies, this task gets passed along to a junior analyst, while at smaller companies, it’s one more thing for the overburdened team of data scientists to do. All too often, this new metric gets made and glanced at a couple of times before joining the collection of unused (and unmonitored) dashboards.
Part of a proper data science project – even one as simple as tracking a new metric – should include determining the value of the metric, assigning someone to be responsible for monitoring it to ensure it doesn’t break, and having a plan for what to do with the result. That is, the project should be properly scoped.
You might be thinking to yourself:
Wait a second...we’re just talking about ad hoc queries here. We shouldn’t need a full project spec just to see if there’s something “there.” If we can’t go to our data team to answer data questions, why do we have them in the first place?
Because although being a data scientist is often touted as the sexiest job of the 21st century, it might surprise some outside the field that the role comes with its fair share of non-glamourous grunt work. It’s been claimed that 80% of data science is cleaning and preparing data, meaning that ad hoc queries compete for your (expensive!) data scientists’ remaining time. As these queries pile up, your data scientists might even start looking for opportunities elsewhere, and you’ll be left with an expensive hole in your team.
This doesn’t mean that an ad hoc analysis of an interesting idea shouldn’t be pursued; instead, why not empower your team with the training and tools to investigate their own questions? This process, often called Democratizing Data, can lead to a number of improvements when done properly, including:
- - Increased job satisfaction (and retention) of your data science team
- - Automatic prioritization of ad hoc queries
- - A better understanding of your product across your workforce
- - Quicker training times for new data scientists joining your team
- - Ability to source suggestions from everyone across your workforce
Democratization leads to better product understanding
Democratizing data means more than just enabling people to make queries on the data. It means helping them develop the skills to be able to read graphs, think about relevant scales, and interpret what the data is saying. By training your employees in elementary data analysis and visualization, you are allowing them to interpret the dashboards displayed in the foyer at a deeper level than “up and to the right is good.” They will not only be able to make their own inquiries of the data, but they will be able to observe patterns in the already-published KPIs and will discover the biggest opportunities within their own responsibilities.
Putting Democratization Into Practice
Your data scientists are there to help you assess ideas and to model and evaluate proposals. Let them own this process. All you’ve done is remove them from the initial process of ideating in order to maximize the use of their time and abilities, as well as embolden the rest of your workforce. Once a crowd-sourced idea shows promise in the initial phase, and an actionable insight is identified, we can give it back to the data science team. This enables them to avoid following false leads and allows them to focus on crowd-sourced ideas that show promise.
These vetted ideas are worth fleshing out into an actual data science project, where they can be checked for statistical rigor, and new data can be collected in a properly designed experiment.
Democratizing data doesn’t replace the need for data scientists, who are aware of subtle statistical traps that occur when analyzing data. Instead, it creates a process to take crowd-sourced ideas and efficiently turn them into proper data science projects, complete with scoping and responsibility for maintenance.
Democratization and the data dictionary
Documentation for databases (and data products in general) is usually pretty bad. Just like their counterparts in software development, few data scientists enjoy the process of writing instructions for databases. It is dry and dull, and they would much rather be making that awesome model.
Giving everyone access to the data and allowing them to make their own queries only makes sense if instructions for the databases are easy to find and understand. Definitions of terms such as “churned” or “session” are standardized so they are the same in different analyses. So while the process of creating a data dictionary won’t be fun for your data science team, it will allow them to be more consistent with terms between different projects. Additionally, it will allow new hires to become productive much faster, instead of spending the first month of employment learning the meanings of the different fields in the tables.
To encourage your data scientists to spend needed time building a data dictionary, remind them of the all the time they will free up from ad hoc queries, and encourage them to focus on the quicker productivity of future colleagues.
Democratization and prioritization
When the culture moves from 'ask a data scientist' to 'let me investigate,’ people will naturally think a little more carefully about the usefulness of a question before investing too much time. It is easy to float an idle query when it doesn’t greatly affect your productivity. For example, take the question raised by the head of product in the opening paragraph: “is our site seeing more traffic due to people being stuck inside because of poor weather?” For most websites, the answer to this question doesn’t matter. Even if people are visiting the site more because of bad weather, you can’t use this insight to generate more traffic (unless your product somehow controls the weather). If the hypothesis was something more directly under your control, then it might be an interesting question. But outside of niche cases, such as seeing spikes that necessitate renting more instances from a cloud provider, such a query might be interesting but not actually provide actionable insight.
What next steps can you take to get started?
Democratizing data isn’t giving everyone (read-only!) access to your database and then letting them do what they want. Instead, it’s about encouraging creative idea generation from a wider swath of your team, it’s about learning how to take steps toward a data science project that could yield benefits for the company, and it’s about more effectively utilizing the expansive talents of your data science team.
For most non-data employees, some degree of training is necessary. There are many ways of doing this, including sponsoring employees to take online courses or conducting monthly company-wide meetings focused on updates from the data science team.
At Metis, we specialize in data science training for all levels and have courses designed to help your teams develop their data literacy together. Some of our offerings to help get you there are Data Literacy, Data Visualization, and Intro to SQL.
Learn more about Metis Corporate Training here.