How To Hire A Data Scientist

In the last two years, a new creature known as the “data scientist” has emerged as one of the must have hires for many firms… How does one go about hunting these camouflaged “purple squirrel” scientists?

How To Hire A Data Scientist

The most challenging positions to recruit for are often the newest hot profession.  It must have been fun to find a silicon semiconductor specialist in 1954 — who wanted to study sand before the transistor was invented?  An HTML expert in 1996? SGML experts all had cushy government jobs or worked for publishers.  In the last two years, a new creature known as the “data scientist” has emerged as one of the must have hires for many firms.  Here at Bright we have assembled an outstanding Data Science group that built our Bright Score, provides interesting data to the media and general public via Bright Labs, and causes endless grief for the engineering team that needs to scale our ideas to our users.

Let’s take a look at our science team on paper.  The team is an eclectic mix, consisting of one former nuclear physicist, one neuroscientist, one geophysicist, one astrophysicist, and a mechanical engineer.  At a glance, you may think we are trying to invent Warp Drive, and on top of that, not ONE of them had Data Scientist as their last job title.  However, each and every one of them have had years of extensive training from some of the brightest minds in the world, have conducted a countless number of hours researching, experimenting, analyzing, and documenting solutions to real world problems, and have had their work critiqued by their peers and published in academic publications.  As the old adage goes, “sometimes things are not always what they appear to be,” a statement very true when it comes to finding Data Scientists.

How does one go about hunting these camouflaged “purple squirrel” scientists?

To begin, what is a Data Scientist?  It depends who you ask.  Common (incorrect) definitions are:

  1. A Hadoop expert.  To those hiring managers that are certain they need a Hadoop expert I submit @DevOps_Borat
  2. A Machine Learning expert.  Every construction project does not need a hammer.
  3. Kagglers

I define a Data Scientist as someone who knows just enough programming, system administration, and statistics to transform a large, possibly heterogeneous set of unstructured data into actionable intelligence or an actual product.  The Data Scientist must also have sufficient visualization and communication skills to be able to convince someone that they did it correctly.

We’ve found that one of the least effective methods for finding a Data Scientist is to log into LinkedIn and search for “Data Scientist.”  There aren’t that many, as Data Science is an emergent field.  There are little to no Data Scientists with 5 years experience, because the job simply did not exist (at least not in its current form).

Where, then, does one find the elusive Data Scientist?  As famed bank robber Willie Sutton once said, he robbed banks “because that’s where the money is.”  If you want to find a Data Scientist, find yourself a disgruntled postdoc toiling away on brilliant scientific research, but failing to land a professorship because … all the professor jobs are taken!  (For those of you not familiar with academia, after earning your Ph.D., you typically work for 2-6 years as a postdoctoral research fellow.  You are a semi-autonomous, but typically work under a professor that was fortunate enough to get their Ph.D. in the good old days when there were actually professorships to be had.)

Private companies that are detached from the world of academia sometimes give candidates, such as these postdocs, a hard time – having the perception that they must not be hard workers and won’t be able to keep up in fast-paced environments, because they haven’t had a “real” job.  The opposite is true, in many cases, people in academia often have to work twice as hard.  The grant funding they receive is insufficient to pay for tools that many take for granted (“You don’t need that $1000 software license! Write that code yourself!”).  Yes, like any other profession, there are a few slackers in academia.  There are some questions you can ask to identify and eliminate them early enough in your selection process:

  1. “Tell me about some peer reviewed papers that you published as first author?”  I want people that can finish long, complicated tasks.  Nothing takes longer, or is more complicated than publishing a peer reviewed paper.  To give you an idea of what that entails, imagine all of the backstabbing people competing with you at work and in your profession, put them behind a wall of anonymity where they critique and criticize every little detail of the project you have been slaving over — that is a peer reviewer.
  2. “Tell me about some code you’ve written that other people use?” Academics tend to be “good enough” programmers.  I don’t need it to be elegant, but I do need it to work.  The best test of whether code is “good enough” is whether at least two other people use it.
  3. “Explain to me the statistical analysis you used in your thesis” Statistics are like music.  Some people play notes, some people make music.  People that really understand statistical concepts at a fundamental level usually make the best Data Scientists.  Anyone can run an Analysis of Variance in Excel, but is that really the best approach?  Ultimately, the worst thing your Data Scientist can do is get fooled by the data.

LinkedIn can still be helpful, and is particularly useful for finding a Data Scientist you respect.  Once you identify one, find their connections that are still toiling away in academia, look up their emails on the university web site (yes, they make it that easy for you), and send them an email.

What is true in sports is also true in hiring — it is better to find a superstar in the draft than it is to find them as a free agent.  They are cheaper, and you get them during their most productive years.

David Hardtke, Ph.D., Chief Scientist

Josh Barger, PHR, Director of People Operations

, ,

8 Comments on “How To Hire A Data Scientist”

  1. sonali
    November 13, 2012 at 11:59 am #

    I like your definition of data scientist. There are certainly many fields where research has been devoted to data mining. For practicing data scientists, 80-90% of our work is putting together petabytes of log files and finding actionable results. Indeed, we want someone who is willing to take a hacker’s attitude to new approaches to solving problems. A researcher with a knowledge of sed and awk and hacky python would be a dream.
    I am not sure about whether PhDs are the best source–at least it hasn’t been reliable. Sometimes there is dismay that we don’t have a SAS license. More typically, the data is not scrubbed and primed for more interesting analysis beyond histograms or maybe some regressions. I think there is a necessary distinction between research scientist and data scientist. Often times people are hiring research scientists for the role of a data scientist, and retention is an issue because the intersection of intellectual stimulation does not necessarily lie with playing with data in the raw sense and creating some type of automated visualization or summary of the data.
    What is the best source? It varies. Industry experience is limited to those who have actually worked with lots of data. In the end, you need to love data and have the ability to read graduate level math. They can understand and apply stochastic gradient descent, implement it in python and have it run independent of a developer- they can source the data and do something actionable with minimal supervision all from a command line. Add in cool stuff with R or D3- there’s your data scientist.

  2. David T Macknet
    November 13, 2012 at 12:17 pm #

    For those of us toiling away in the trenches, though, there seems to be very little in the way of advertising. It’s nice to say that they should be found … and some of us WANT to be found … so, how to make that crucial connection seems to be the problem.

  3. Justizin (@justizin)
    November 13, 2012 at 4:59 pm #

    Actually, I’m just going to say, Data Science is not a new thing. In today’s world, everyone who does any job should have a good amount of computer literacy, more than people in the past, and longer than two years ago I worked with some people called “Analysts” who crunched numbers to try and form projections. Information Science is older than Dewey Decimal and the notion that Data Science is a new thing because it’s more common to have lots of data is just plain silly.

    Realistically, if you have someone called a Data Scientist who lacks any of that true grit, they aren’t qualified to do their job. The same was true of analysts ten and twenty years ago, but companies like Microsoft, Lotus, and Corel gave them training wheels, which they typically used for their entire career.

  4. Céline Van Damme
    November 14, 2012 at 11:10 am #

    I totally agree with @Justizin. Data science is indeed something that
    exists for a very long time. The only difference is that we’re now
    thinking about data science and a data scientist in a different way.
    The same holds for big data. Big data is not something new, it exists
    already for a very long time. Most large companies have a huge
    database of unstructured data which they collected many years ago.
    They simply didn’t realize they had so much data, and were just
    storing the data in mysql databases. Social media has made big data
    become popular. Most companies now realize that there might be quite
    some value stored in this unstructured data set, and they now need
    data scientists to unlock hidden knowledge stored in their databases.

  5. Jock Busuttil
    November 16, 2012 at 6:38 am #

    You’ve certainly nailed the technical skills to look for in a data scientist, but I think there are other facets that companies are looking for in an outstanding data scientist. I’m by no means a data scientist myself, these are purely observations.

    So first of all, a data scientist needs to have the technical skills to do the job and David’s summarised those above.

    Secondly, a data scientist needs creativity to consider how to derive insight from the data in possibly unexpected ways. Particularly now that it is possible to analyse whole populations of data, rather than samples and models, I would argue that the conventional techniques may not always be the best approach.

    Thirdly, a data scientist needs to be able to communicate. This means being able to translate the complex, statistical analysis into simple pieces of insight that non-statisticians will understand, whether expressed verbally or visually.

    Data science may well have been around for a while, but I think that today’s Data Scientists need these extra strings to their bows because that is typically what organisations are demanding of them.

    Would you agree?

  6. Mike Lee
    November 19, 2012 at 4:07 pm #

    I love your last line in the article. As a recruiter, I tend to look for hiring managers who “draft for talent rather than need” … yet another phrase to help articulate your point.

  7. silverkeyinc
    November 19, 2012 at 4:12 pm #

    I love your last line in the article. As a recruiter, I tend to look for hiring managers who “draft for talent rather than need” … yet another phrase to help articulate your point.


  1. How To Hire A Data Scientist | My Daily Feeds - November 13, 2012

    […] via Hacker News […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: