Wednesday, March 19, 2014

Data Scientists ... where are they?


The following blog was inspired by an article


http://www.kdnuggets.com/2013/12/unicorn-data-scientists-vs-data-science-teams-discussion.html

and was posted as a comment in response to it ....




I was intrigued by the article as I have been struggling to find practitioners to hire who have the required skills. I agree wholeheartedly with the venn diagrams in the article.
At makeplain we deliver improved outcomes to our clients derived using advanced analytic insights from operational data. I personally have been using the three skill sets plus business domain knowledge for 20 years to deliver operational recommendations to our customers.

I believe that all three skills are required in one individual to most effectively accomplish the task of creating actionable insights from large volumes of operational data. Individuals with these skills are few and far between, most having learned the skills from practical experience. A team of specialists cannot effectively deliver the required result in a timely fashion because the process is iterative requiring some back and forth between domains. The lack of cross-disciplinary knowledge on a team results in deliverables between team members having significant gaps. i.e. a database specialist cannot build a high quality analytic file to hand off to the mathematician and the mathematician cannot adequately describe the requirement to the database specialist as they do not understand the nuances of manipulating potentially terabytes of data. The mathematician who designs an algorithm does not know enough about the computer science implications of a large computation requiring parallelization and this results in impractical analytic processes. you get the idea.. Certainly there are individuals who have multi-disciplinary knowledge in more than one domain and teams containing these individuals get the best results. Nothing beats teams of individuals who have all three capabilities and that is what we try to train at makeplain.

Business domain knowledge is a secondary requirement if you have a team leader who brings this to the team but lone data scientists require business domain knowledge as well.
We have had most success hiring engineers as their background tends to be more general and they get moderate exposure to all three domains. Mathematicians and computer scientists tend to be more specialized (mathematicians being most specialized). We find it easier to teach more math to engineers than database skills to a mathematician. Certainly some individuals take out-of-program electives that give some broader exposure and we typically seek out these individuals as prospective hires.

I believe the current lack of individuals with the requisite skills is slowing the adoption of advanced analytics by corporations and has also been the barrier in the past. We certainly see a lot of current discussion (Big Data, Cloud Analytics, Data Science etc.) about the topic and I have seen more recent adoption by corporations than in the past 20 years. My personal belief is that there will never be enough data scientists to fill the business need in its current form; only a stable percentage of the population has the interest and requisite skills to pursue the necessary education. There is a minimum educational requirement to pursue this career. Only exceptional individuals could learn this on the job without quantitative post-secondary education. I think that in the near future we will automate the end to end process (across all three domains) of insight creation and business recommendation/execution and embed these processes into operational systems creating expert operational systems. In this possible future we won't require armies of non-existing data scientists and we can direct the realistically smaller number of practitioners to address analytic problems that are difficult or impossible to completely automate. Automated analytics delivering a 7 out of 10 grade, can be applied to the 10's of millions of "near-random" decisions that corporations operationally execute each and every day.

At makeplain we are trying to accomplish this level of automation and have made significant process towards this vision over the last 10 years made easier every day by Moore's law and other machine learning/database advancements brought to market by many great companies.

Significant adoption of advanced analytics by corporations and government used strategically and tactically to make daily operational decisions can create significant economic efficiency taking the pressure off our current debt-laden and slowly crumbling economies.

Comments welcome....