The do’s and don’ts of starting a data science team

Last week, the Prime Minister’s chief strategic adviser – Dominic Cummings – wrote a blog which attracted a huge amount of media attention. He called for a radical new approach to civil service recruitment – suggesting that data scientists (among others) should play increasingly important roles.

But while data scientists were top of Cummings’ list, it was his call, later on, for more ‘weirdos’ in Whitehall which really caught the media’s imagination. Here, we outline some do’s and don’ts when building a data science team.

yay_science

For anyone kicking off the year with a new data science initiative, we applaud you! Embedding data and technology into decision making processes can be a wonderful thing. To help you along your way, here are a few do’s and don’ts that have been borne out of experience.

Don’t… Assume R&D is easy
Do… Appoint a technical leader
If you’ve been tasked with managing this initiative, but you’re not an experienced data scientist, then you need someone who is. You need a team leader who lives and breathes selection bias, measurement bias, and knows when a result is meaningless. Without this experience in your team you will at best waste time and resources, and at worst create dangerously unsound technology.

Don’t… Just hire weirdos and misfits
Do… Carefully craft your team
The notion that data scientists are geniuses who can solve all your problems, armed only with a computer and some data, is flattering – but ridiculous. Data scientists come in many flavours, with different interests and experience, and the problems worth solving require a team effort – with the best ideas coming from diverse teams who can communicate well.

Don’t… Trust textbook knowledge alone
Do… Hire for experience too
There is data science knowledge you can glean from a textbook, and then there is the hard-earned stuff you learn from years of building models and algorithms with real data, implemented in the real world. Nothing makes you understand overfitting and the limits of theoretical models like living through that cycle a few (hundred) times.

Don’t… Ignore ethical issues
Do… Take an ethics-first approach
Get ahead of any ethical and legal issues with your work, or the data you are using. Don’t assume it’s OK to do something just because you heard a Silicon Valley start-up does it like that.

Don’t… Obsess on the latest academic papers
Do… Identify questions
Normal rules of business apply to data science; you want a return for your investment. Start by identifying the intersection of high-value business problems and the information contained in the data. You could ‘dart about’, trying out ideas from cool papers you’ve read, to see if anything useful comes out. But such unstructured work is akin to randomly digging for treasure on a beach. Get yourself a metal detector—identify business problems first.

Don’t… Show off
Do… Keep it simple, stupid
Unless you have been specifically asked to build something superficially clever and incomprehensible (and this is a genuine objective for some), then you should use interpretable models first. Often this will be good enough. Only introduce complexity if you need to, and use a simple model as a baseline against which you can measure improvements.

Don’t… Propagate hype
Do… Manage expectations
So, you’ve been thrown some resources to set up a data science team and you’re embedded in an organisation that doesn’t necessarily understand what data science is. With such power comes responsibility! Avoid hype. Manage expectations. Help your peers and leaders understand what you are doing, and make sure they have input to it. This is a joint effort and they bring important domain knowledge. Agree on goals, and be transparent about progress.

Don’t… Command and control
Do… Create a scientific culture
Do your team feel they can challenge the scientific views of the leadership—or are they scared of being ‘binned’ if they step out of line? Your team is on a mission to solve a problem, and it is unlikely the path will be an easy one. Your data scientists will spend most of their time stuck, navigating a sea of unknowns, while in pursuit of answers. Scientists need to be able to talk freely about what they do and don’t know, and to share ideas with each other without any sense of one-upmanship.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: