Skip to main content

10 Tips for Hiring a Data Scientist into a Tech Company

What gap are you trying to fill? Before you get close to offering a Data Scientist a job you should be clear in your own mind what skills gap in your organisation you are trying to fill. In my experience these are good reasons to be hiring a Data Scientist:
  • You need someone with mathematics, and in particular statistics, skills that can do a better job of understanding data and creating meaningful outputs than your average accountant or computer scientist.
  • You need someone that thinks and operates in a numerically framed way, someone that is comfortable with representing concepts as graphs and formulae.
Those 2 core competencies are to be found in any successful Data Scientist. You may be tempted to frame the role in the following terms:
  • You need someone to make sense of a large dataset, to understand the dimensionality and the "shape" or distribution of the key components of that data.
  • You need someone who can create, improve or debug some very sophisticated algorithms, i.e. the kind of algorithms that a software engineer would claim to be too complicated to be practical.
But these 2 role traits are per project skills, and if you only have 1 dataset or 1 key algorithm then you don't need to hire a Data Scientist, you need to offer a short term contract to a Data Science consultancy or trusted academic. If you don't have a stream of interesting data problems to throw at your Data Scientist then don't bring that skill in-house, there are more effective options. Thus...
Tip 1 : Make sure the role is big [data] enough

You need to think how the Data Scientist will integrate with the rest of the tech team. I have come across organisations where the Data Scientists are engaged in a full on insult contest with the software developers. This kind of issue comes from software developers not typically being good at maths and Data Scientists not having the day to day industrial coding skills, and both sides fear that the other is a threat to their skill dominance. The following diagram shows the 3 relationships that are formed when bringing in a Data Scientist:


For success, the critical relationship is between the Product and Data teams (B), the other 2 relationships are harder to form; a good product manager will be able to maintain the motivation of both the developer and data science teams by drawing on their strengths and not exposing their weaknesses, eventually a level of respect for each other's talents will lead to a Developer to Data Scientist working relationship (A); the relationship with the business is also hard because the business expectations on data projects is currently very high and the cost and lead times of those projects are a major investment, and as a result it will take time for the relationship (C) to form in a sustainable way. The product managers should be already adept at creating bridges between teams. Thus...
Tip 2 :  Align the role with the Product Management team

A positive aspect of aligning with the Product team is that the targeting of the Data Science effort goes towards product goals, and the statistical skills are not sidetracked into more general BI. It is not that Data Scientists can't do BI, but if you want them to go in that direction, you should get that in the job description (which will alter who applies).

In the same way that you should keep the job description focused on the product area, you should take care to avoid including detailed engineering or environment specifics. It is true that a Data Scientist will be more productive in a commercial context on day 1 if they are already familiar with the systems (source control, project management, build and release processes), but they can learn these systems and the introduction of data centric development is very likely to change these established processes over time. Should you teach Data Scientists to do BDD or Software Engineers to use Jupyter Notebooks? Either way you should keep the job description tight and avoid all the nice to have requirements that will cut down the applicant list. You can take these things into account, but you want the strongest Data Scientist not the one who is most like your average developer. Thus...
Tip 3 : Stick to Data Science skills in the job description

At this point you might find that you are not looking for a Data Scientist at all, perhaps you are in fact looking for a Data Engineer or a Data Ops person to fulfil your business goals. These roles are more aligned with systems engineering and operations skills and you are really looking for specific experience with big data and ML workloads. If this is the case please consider getting in touch contact@infer.systems ;)

So before you advertise the position you will need to have a budget in mind; I suggest you take the average Developer salary and double it. Commercial Data Scientists are in high demand, and so with such an inflationary market there will be little or no correlation between the salary and the strength of the candidate, by paying more you get more experience, but not really any guarantee of more skill or the ability to achieve ambitious goals. Thus...
Tip 4 : Do not put a salary on the job advert, you want to see all applicants

Data Scientists are rare beasts when compared to the overall tech market, so you need to use the same hiring techniques as you would do for any hidden talent pool. Thus...
Tip 5 : Get the word out via Meetups and specialist Recruiters

Whether you get a trickle or a flood of applications will depend on your specific advert and role; but you will notice that the applicants fall into 3 basic categories:
  1. People that wish they could do Data Science, but cannot demonstrate the capability;
  2. Engineers who can do enough maths and have learned commercial Data Science;
  3. Physicists (or similarly computational and data intensive science discipline);
It is possible to find what you are looking for in the first 2 categories, but unless you are certain you have found the best candidate ever then you are taking a big risk. I am an Engineer myself, and I know that I am not an A-list hire for a Data Science role. Engineers make very good Data Engineers and Ops team players. But just filtering for Physicists is not enough, the world of academia is not a good a place to develop skills to handle the commercial pressures of Data Science; there is time pressure but nothing like that in a commercial context, there is team work but at an insignificant scale compared to large tech companies, there is openness and transparency but nothing like the secrecy culture of companies like Apple. Thus...
Tip 6 : Are Engineers just as good as Physicists? No, so filter for solid computational science backgrounds AND commercial experience

A good Data Scientist will be well aware of the commercial value of their discoveries and insight in a role; and so discussing the impact they have had on the organisations and projects that they work on will be hard to do in a public context - so you will see some very vague language in good CVs. Candidates will also struggle to describe their achievements in a interview without compromising indicators of their employers IP. Thus...
Tip 7 : Don't interview applicants from competitive businesses unless you want to end up in court

And for your own business ambitions, you don't know who else this Data Scientist is interviewing for. You still need to be relatively closed about the goals and product directions, but with an NDA in place at least you can get deeper into the science with less worry around context. Any candidate with commercial experience will understand the need for a NDA. Thus...
Tip 8 :  Interview under mutual NDA in all cases

During the interview process you are actually looking for research skills, so you need to give the applicants the opportunity to show that they can research your problem. So drip feed them the areas you are working in the first interview and then go back over those areas in a subsequent session. In particular you are wanting them to have read and understood the generally applicable algorithms and models prevalent in you problem domain. Thus...
Tip 9 : Use at least 2 interviews and look for the homework they have done between them

In Developer interviews there is often a coding test, the equivalent for a Data Science role is harder to execute; the data is large, the candidate may not have access to the storage and processing capabilities required to analyse it, the data may contain PII or commercially sensitive data that cannot be exported from the confines of the corporate IT systems. But you still need to validate the skill set and how the candidates cope with a product description and can communicate the intermediate and output thinking. It should be possible to find a data set that you know well and can interactively explore with the candidate, much as pair programming can create code, pair data exploration can lead to a better demonstration of real capabilities. Thus...
Tip 10 : Use paired interactive data exploration to assess data handling and soft skills

If you manage to follow these tips then the role should be right, the management structure around that role should be right, the job description and salary will pull in the right candidates once you get the word out, you will know what to filter the CVs on, when to set up the NDA and how to check the skills of the applicants. The only trouble is that the demand still far outstrips the supply of good commercially minded Data Scientists and so you may have to revise your budget upwards to get to the point where you can meet the expectations of the candidates in both salary and interesting project terms.

Infer Systems can help you with the process of getting the most value out of your Data Science team by supporting them with the cloud infrastructure and software engineering tools required to effectively run a commercial ML or data driven product. Please get in touch via contact@infer.systems to find out more.

Comments

  1. The present information trends are providing us 80% of data in unstructured mannered while rest 20% structured in format for quick analyzing. data science course syllabus

    ReplyDelete
  2. Make use of your network of contacts to choose the ideal Salesforce Integration Consultant by getting in touch with someone who can make you aware about the advantages and drawbacks to expect and the lessons they might have learned from their experience working with an organization of similar size and requirements. Salesforce training in Hyderabad

    ReplyDelete

Post a comment

Popular posts from this blog

Lightweight Conjecture Records for Research Teamwork

Lightweight Conjecture Records for Research TeamworkIntended for Data Science AI/ML Research Teams, but generally applicable. Slugconjecture-records-improve-research-teamwork ContextData Science is now a first class citizen of the technical world, but that is only a recent development and it still lags behind hardware and software in terms of ecosystem maturity. One area that is still behind the curve is the area of teamwork and working on large scale objectives. From Fred Brooks to Michael Nygard the software and system architecture challenges have always been the same - how best to communicate the solution in your head. So following in the footsteps of LADR files, and in the style of the original post... ConjectureWe posit that keeping a collection of "domain significant" conjectures will improve research teamwork; these conjectures put forward experimental thinking that affect dimensionality, data characteristics, pre-processing options, calibrations, qualitative analysis an…