Josh Wills (Global) - Generation Why? Are the Future of Data Science

Earlier this year, I came up with a pithy definition of a data scientist as someone who is better at statistics than any software engineer, and better at software engineering than any statistician. I posted a version of that definition to Twitter in May, and it has proved exceedingly popular, having been retweeted over 500 times as of the first week of November.

Given the present shortage in data scientists, it's helpful to have a concise way to talk about the skills that we need people to have in order to fill the role. But how does someone go about picking up this peculiar mix of skills? In the conversations I've had with my fellow data scientists about our own personal paths into our profession, a few themes have appeared over and over again. The first, and by far the most common, is a strong sense of curiosity. Data scientists were the kids who could never seem to stop asking, "Why?" Not surprisingly, the vast majority of data scientists I've worked with have some kind of graduate-level training in a scientific or mathematical discipline.

The second theme was the experience of working on a problem that required them to learn how to use tools and methods that were outside of their comfort zones. This could be anything from a biologist who started learning about search engine technology in order to better manage genome sequencing data to a software engineer fighting a spam problem who rediscovers signal detection algorithms that were developed over 20 years before. In all of these cases, the curiosity that motivated them to find a solution to their problem also led them to learn the techniques and the language of other disciplines, broadening their intellectual horizons and improving their communication skills.

I am a firm believer that project-based training, where a student gets hands-on experience with every step of the process of solving problems with data, is the best way to teach the next generation of data scientists. The role will always require an interdisciplinary mix of skills, and there is no requirement for a data scientist to be the world's best software engineer or the best statistician. Rather, we need people who have the knowledge and experience of every stage of the data project lifecycle, from exploring a new dataset to evaluating the performance of a machine learning model, along with the wisdom to know when they need to look for guidance and ask for help.

Going forward, I am most excited about the prospect of data scientists who come to web companies from fields like neuroscience and sociology, returning to their original disciplines and teaching their colleagues some of the skills they have learned. I believe that we have a duty to build tools that offer users simple interfaces that they can use to ask bigger questions and receive sophisticated answers. In the process of building these tools, we will discover new patterns and abstractions that will prove to be just as powerful as the spreadsheets and relational databases that have served us so well for the past 40 years.


By Josh Wills, Data Scientist, Cloudera



« The Internet of Things: Breaking Down Barriers to a Connected World


Roel Castelein (Scandinavia) - Does a Scandinavian Swallow Hail the Green IT Spring? Part II »


Do you think your smartphone is making you a workaholic?