This post was written by Brendan Herger, Senior Data Scientist on the Corporate Training team at Metis.
tl;dr: Data Science is transforming corporate America and bringing your company into the revolution is easier than you might think.
The word 'pioneering' is rarely associated with banks, but in a unique move, one Fortune 500 bank had the foresight to create a Machine Learning center of excellence that launched a data science practice and helped keep it from going the way of Blockbuster and so many other pre-internet relics. I was fortunate to co-found this center of excellence, and I've learned a few things from the experience, as well as my experiences building and advising startups and teaching data science at other companies large and small. In this post, I'll share some of those insights, particularly as they relate to successfully launching a new data science team within your organization.
I won't beat around the bush: Data science is "The Sexiest Job of the 21st Century". Perhaps more importantly, building a data science practice is one of the most valuable investments a company can make.
No one wants to be left in the dust of the data revolution. This is why leaders at companies like Airbnb, Facebook, and Google are hiring data scientists at breakneck speeds and attaching data scientists to every engineering team. These hires aren't for vanity. Machine learning is transforming corporate America, with highly visible and lucrative wins including AI assistants, driverless cars, and machine translation paving the way for less visible but very important wins like authorization fraud and relevant search.
Every company is unique, but below is a battle-tested playbook for getting a data science practice off the ground.
Some company leaders will get it without much convincing that inaction could mean falling behind the industry. Others might be harder to persuade. Either way, it’s helpful to find a case study showcasing how a similar company is profiting from its data science team, or how new products in your industry are centered around the benefits of data science. Even better, a brief proof-of-concept project could help highlight the low-hanging fruit you and your team could tackle.
Once you've got support and buy-in from your leadership, you can begin a backlog of projects that data science would enable, demonstrating how your team will integrate with existing software engineering and data engineering resources. Even if your existing team and leadership don't have quite the right way to phrase their pain points and ideas as data science problems, this backlog will help prioritize hiring and initial team projects.
With support and ideas for potential projects, you can now start hiring. We'll cover hiring in-depth in a future blog post, but suffice it to say, this might be the most difficult part of the journey. Data scientists are in demand. Moreover, those with experience building data teams from the ground up likely know their market value and can afford to be picky. Depending on what you need, the challenges could be significant.
To get candidates in the recruitment funnel, I'd recommend generating organic leads through Meetup groups and conferences (including conference receptions). Sources and hiring agencies can be worthwhile, but formal emails are easy to ignore in a hot market. A quick survey of data science friends indicates that we receive an average of 30 recruiter messages a week, making it difficult to stand out.
It'd be wise to choose the first few hires with proven track records of building products, and/or mentoring others. As the team grows, there may be room for R&D and specialized roles, but early on, it'll be all hands on deck proving value.
If you happen to have existing software engineers or data engineers with a math background (or a good amount of grit), it may make sense to provide them with time and training to skill-up into a data science role. Existing team members know the company and culture already. Also, providing a skill-up opportunity can help with retention and can help keep your A-team up-to-date and mentally challenged.
Once you've got your motley crew together, it's important to build a strong foundation for your growing team.
Data scientists come from a variety of backgrounds and practices and might bring to the table a wide array of skills, workflows, and preferred tools. Have conversations early and often about best practices, including what team members can expect from each other. If you subscribe to the Agile or Scrum dogmas, now's the time to indoctrinate.
Nothing forms a team quite like a shared crucible. If there's a particularly high-value, low-effort project in your backlog, take it on. This will help your team learn how to work together while gaining visibility within the company. Otherwise, data engineering and data lake projects could lay the groundwork for many more data science projects and can help your team get familiar with their new data.
After you get your footing with your first few projects, begin talking about what your standard workflow looks like, libraries and infrastructure you'd like to build, and the cost of technical debt.
I'd also recommend scheduling monthly happy hours or other fun events. It's important for teammates to trust each other and get to know each other out of the office. Also, your new hires are probably getting LinkedIn messages already and beers are less expensive than more recruiting.
Alright, now you’ve got a strong team of data scientists who've proven themselves with some minor projects. On this foundation, you can start earning the support and buy-in your leadership has loaned.
Though you might have done a few initial projects, your first major project will help to define your team and your team's role within the company. Choose a major project that can be done in milestones, provides a high visibility win, and that you know you can deliver on. Great first projects include creating a new data warehouse, creating a homegrown alternative to a vendor model, or creating a viable new product offering.
Once you're about 60% done with your first project, start presenting to other groups to get their feedback and buy-in (and shake out any new project proposals). At about 80% done, start presenting the project up the food chain to help leadership understand how their investment is paying off.
Once your first project done, keep pumping them out!
Before you go and conquer the world, there are a few last lessons learned that might be helpful:
1. Augment, not replace
It's easy to fear being replaced by the machines. Help existing staff realize that, in general, your team will augment and streamline their roles rather than replace them. Most of my data science projects have alleviated the boring parts of other's roles and have allowed them to leverage their specialized skill sets.
One of my favorite recent projects allows users and moderators to determine if a Reddit contains spoilers. Another common workflow is to flag toxic content for human review. On the corporate side, a recent project classified and triaged incoming messages, allowing lawyers to spend less time sorting mail and more time practicing law. Yet another great case study enabled security analysts to spend more time evaluating trends and less time scrolling through email logs.
2. Tribal knowledge
While data science is a hot new skill set, there's still a lot of value in the domain (tribal) knowledge that your company has built up over time. Whether it's knowing that cdt really means charge_off_date, or that the company's proxy requires voodoo witchcraft to work, there's a lot that the existing staff can teach your new team. Embrace this help, don't fight it.
Just as your peers are learning about data science from you and your team, find ways to learn from the old guard.
3. Embedded vs. monolith
One of the largest ongoing discussions in data science (other than tabs vs. spaces) is whether data scientists should be embedded (data scientists on each product team) or monolithic (all data scientists on one team).
The embedded approach allows data scientists to build product knowledge and specialization and to keep data science goals aligned with product goals. The monolithic approach enables more standardized workflows and skill sets and provides data scientists with a brain trust of peers.
Another popular approach touches on the best of both worlds by hiring data scientists into a data science 'guild' that has regular guild meetings and deploys data scientists to different product teams.
4. Project planning
The most common mistake I've seen is to run your data science team like a software engineering team. While there is a lot of overlap, generally software teams are able to define features and milestones at the start of a project, whereas data science projects tend to be less linear, with scope evolving as data quality, research, and model training inform future iterations.
Go forth and conquer
Now that you've got your playbook, go out and get some buy-in. Data science is rapidly transforming corporate America and no one wants to be left in the dust.
Want to hear more from Brendan? Check out his blog, chock full of technical and non-technical expertise.