Building a data science team

in #datascience7 years ago (edited)


BuildDataTeam.png

Teaming up developers and scientist for a data science driven department is usually difficult because it is a large up-front investment for a company and is hard to properly leverage. For early stages startups, there are struggles to understand how the team is providing value while data scientists also struggle to show their value to the business. This posts is to share my experience, thoughts and hopefully insights for anybody in a similar situation.

Things as the things are, Data Science is very interesting, very popular and very expensive and difficult to leverage from. If you are in a position where your opinion counts as a C-level executive, investor or co-founder, i hope this the next words will illuminate a little bit your path, since is the written brief opinion and experience of an insider who was managing to do this as easy yet robust and cost efficient as possible. I have borrow some ideas from other colleagues around the internet and i do recognize their work and thoughts about how to build a data science team.

The beginning.


So you are a startup co-founder and you have heard one thing or two about “Data Science” and that kind of stuff, you realize some how that you need to have a “Data play” in this new era, it sounds good, it is technology and it has to be present in some extend in the company, yeah, may be it has something to do with Artificial Intelligence also, that fancy thing is all over the internet. May be Artificial Intelligence is more powerful and accurate than Natural Intelligence, whatever that means. Investors and clients do enjoy the concept and share open interest on it so the question is no longer why or how, but when.

So, you hire your first data capable engineer, one whose focus is in data driven developments and programs. Yeah, that more complex stuff than a very well organized and dynamic excel, with more data also, maybe you can ask for something in real time and of course it has to be programmed and hosted in some visual friendly web app, or in the back end as an engine to calculate something. There, you have a reasonable idea of what it can be done, at least from the “traditional” point of view in terms of late 90’s or early 2000’s technology.

Now you hire your first data scientist. Somebody like a fresh graduate because those with industry experience or PhD’s are too expensive and hard to find for an early stage of this new department. But a engineer with decent extra work and projects will do, may be a small team of 2 or 3 persons with different profiles and experiences, yeah, 3 persons is a good number. As Ben Wilde wrote on this article, 9 tips for building data science teams, its better to hire teams not unicorns (a rare engineer/programmer/scientist). Hopefully you will get to hire a few people that get excited about doing Machine Learning and AI, and so you and the other stock holders will be. This is going to be awesome !

From academia, to industry, to Inadequacies, to conflict.


Your data team starts, ready to take your data and build prediction models from it, foresee the future and then take actions based on it, increase revenue, lower costs or something in between besides adding value to the company. On situation is common in this phase of the process, as stated in this Article, How to hire your first data scientist Back in academia, the science based major and graduate level students, were working with data sets very similar between each other, reproducible work and example are among the favorite resources of teachers and professors. With a lot of hope, some of those data sets are present in your day to day problems, actually, those data sets they have worked on were the same for several generations, so they have a decent knowledge about them but not 100% real life problems. But, since your team is in a real life problems with several constrains, missing data, etc, they will have to create a sort of data base by their own, and understand that very deeply. No problem! Your startup has some API connections and of course the database of the clients and their info, pretty enough data ready to be mined.

Problem arises.


In your company right now there is no infrastructure for data analysis, so everything has to be done from scratch. Also your data science team is trying to install and configure various tools incompatible with your stack or architecture currently in use, although your CTO is willing to cooperate and gradually move things to a more generally useful framework for every body. But for now, the short term is waiting to be accomplish a minimum and feasible transition framework to get this done.

The data team wanted to work on machine learning – those elegant mathematical and statistical models they spent a great part of their career learning, thinking and coding about. Of course, they have expected that a new project will demand to put some effort into the gathering, cleaning and warehousing data – but they didn’t expect to go from absolutely 0, even from -1 (That is the level where back end connections actually don’t exists, they had to be build), they didn’t expect to be so more complex and messy, with so much data to search and look for in the first place. Also they didn’t expected to spend so much time with questions about how the data was gathered and why, exactly, do the company has to have “data wires” that collects info in all the apps and user experience exposed environments. Some has to be said also about how the rest of the company kind of care so little about how each of the business lines or projects has to begin with “what data we will need in order to take data driven decisions in the future, deeply enough, not just spreadsheets

The team is now in a position of forget about to work, at least in the next month or two, on machine learning. They have been warned about using 80% of their time in cleaning and analyzing data, that sounds like a Data Science standard. Instead, they spend 80% of their time making the connections, begging for data to be fetch, created, accessed, stored, moved and just after that, data to be explained. The other 20% of the time they are spending in lobbying for data science friendly tools and connections to the technology or engineering team, security policies for connections and may be some extra resources like servers, data sources and stuff to be purchased.

The programmers and the other kind of programmers


The technology or engineering team is kind of frustrated, as you. They have to take time away from their own work to do this requests and explain why the internal infrastructure is ok for now but in the near future (yet unknown when exactly) will be added more data points to gather information from, in order to provide the precious commodity for the data science team, data. Now the technology team may be even think about this data science thing as an expensive feature on the company that does nothing useful yet and still manages to constantly ask for data not being good enough, maybe its not the right time, may be they need to learn how to do regular technology more deeply and independently before thinking in machine learning...who knows, but, wait, thats exactly the problem who knows. So you think, Ok questions arises between “programmers” about the other kind of “programmers”.

Actually you, the co-founder, are even more frustrated. It’s been several months and the data scientist didn’t even produce a decent dashboard yet, or a piece of software that directly makes revenue for the company, kind of much less magic than you originally expected to be performed in front of your eyes. Much less the magic machine learning everybody keeps talking about. Plus, they have some kind of different focus and way to do things, also kind of isolated for being “Programmers” that do not know other things that other “programmers” do and accomplish, like building your webpage and mobile apps. Deep down, you start suspecting Machine Learning and Artificial Intelligence is just a hype, a very early stage hype. At least for your company as right now, but you keep this to your self and tell the other investors and co-founders about the crazy smart people in that department, who work hard and is super excited to work on developing new kind of tech and stuff.

...So you have learned a thing or two.

  • Good things and breakthrough developments takes time and resources

Its good to visualize this more as a marathon than as a sprint. With communication and patience you will have a steadily growing business in the short and easy to understand tech developments and also the complicated, fancy and value adding technological developments.

  • New technology can help to answer old questions

It is very useful to view old problems with a different perspective, an easy example would be that instead of going directly to the client and gather information of your service with surveys and phone calls, you can handle logs info, social media info, user experience info of your clients and potential clients and analyze it massively, independently and automatically, that will do the job more efficient and hopefully more accurate, even at lower cost in the medium term.

  • Invest in the future

As simple as that, invest in having cutting edge technology, even if it has to be months of developing infrastructure and models, early adopters are the ones who enjoy the uprising ride of a new technology development, and, we are in deed in a new era were scientific programming, machine learning and AI is changing everything.

Shared also in my public profile at LinkedIn Here

Sort:  

Congratulations @sono.arquetip! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes

Click on any badge to view your own Board of Honnor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Congratulations @sono.arquetip! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of comments received

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Warm welcome @sono.arquetip! good move joining steemit! Feel free to ask me questions and to follow me.

Thanks, sure i will.

Coin Marketplace

STEEM 0.21
TRX 0.20
JST 0.034
BTC 90827.60
ETH 3116.50
USDT 1.00
SBD 2.97