You're Not a Data Scientist
Many of my friends, colleagues and contacts have started calling themselves Data Scientists. A number of resumes have crossed my desk indicating that we’re minting data scientists faster than expected. I’ve seen this movie before. The IT biz has historically rebranded job titles based upon what’s trending – today’s Software Architects were once known as Designers or Systems Engineers. Nothing is trending faster and louder than predictive analytics, machine learning, deep learning and AI. So it’s our turn to rebrand data geeks as data scientists.
Now don’t get me wrong – some of these folks are legit Data Scientists but the majority is not. I guess I’m a purist –calling yourself a scientist indicates that you practice science following a scientific method. You create hypotheses, test the hypothesis with experimental results and after proving or disproving the conjecture move on or iterate.
Data science is an applied science. So as an applied scientist you create things – models, methods, and algorithms that provide practical utility. These ‘things’ are valuable because they predict future outcomes from relatively few data inputs. In some cases your models are black box enigmas – you might not understand how the prediction is derived – you’ve only shown that the models are accurate.
So in the spirit of maintaining an unadulterated definition of data science I make the following assertions that might indicate you’re not a data scientist:
- Expertise with the business intelligence stack doesn’t make you a data scientist. You’ve spent much of your time predicting the past by performing time series analysis of historical data. It’s not data science – you rarely perform experiments, your predictive power is illusory.
- Programming experience with Hadoop, R, Python, Octave, Matlib and Mathematica are data science tools. Tool skills – alone – don’t give you data science cred.
- An advanced degree in mathematics, statistics, econometrics doesn’t mean you’ve earned the right to call yourself a data scientist. Hopefully you’ve developed the skills to apply descriptive and predictive techniques while maintaining a strong grasp of the underlying theory. But data science is an applied discipline focusing on specific subject area data – most likely you didn’t receive sufficient real-world experience pursuing your college degree.
- Evangelizing that big-data, little-data any-data is the future of the predictive enterprise looks relevant on your resume, may get you a few conference speaking gigs and entertains your friends at cocktail parties BUT you’re not a data scientist. You’re a big data groupie.
- The 8-week course you took on Coursera or the Data Science boot camp you attended no more makes you a data scientist than my recent golf lessons make me a golf pro. I believe in lifelong learning and I’m all for self-improvement but this is self-delusion.
- You’re a subject matter expert, an Excel wizard capable of creating incredible charts, graphs & pivot tables. Those skills, while valuable, don’t make you a data scientist.
- You’ve recently acquired a data science platform from SAS, IBM or Microsoft and without prior experience and after reading the manual, watching the 10 intro videos or taking the 5-day training course believe that you can create predictive/ explanatory models of subject matter data by dragging and dropping algorithmic widgets onto a canvas and pressing the ‘LEARN’ button. You’re not a data scientist – in fact – you’re dangerous.
I know that this was written in snarky fashion – I apologize if I’ve offended. But I think it’s time we clearly define what a data scientist is and ‘is not’. I know that I've omitted other data science sub disciplines like experiment design, sampling, etc. Maybe next time.
Very interesting
Upvoting. I found myself nodding in agreement for the most part but I think you're being a bit uptight about labels.
People appropriate what label they will. Some of them will "fake it until they make it" and some will just use it as a flashy label. I think the problem, in some cases, is also how non-practitioners perceive people who do predictive or machine learning work.
I'm fortunate enough to work in the business unit of an organization where I can actually work on and formulate subject area hypothesis and design projects to test out that hypothesis with real data. My counterparts in IT shops are more like ML-using data engineers. Do I begrudge them for the title because they don't have subject matter expertise? Not really.
For someone who works with data and probabilities, it's not terribly helpful to view the world in black and white terms -- ie data scientist vs not data scientist. How many project managers in your company do you see that don't manage anything or anyone?
Disclaimer, I leaned on R programming skills to "fake it til I make it." I had been on the fence about using the title for a long time