Democratizing Data Science (Data to the People!)
March 6th, 2012 Ken Chestnut
Last week, I attended O’Reilly’s Strata Conference in Santa Clara, CA. The theme of this conference was ‘Making Data Work’. With all of the great topics and speakers expected, I was most excited about the ‘Deep Data’ track on the first day. I was looking forward to hearing first-hand from some of the leading experts in the data science field.
The day started with a great overview by Michael Rhys (Microsoft). He explained what SQL and NoSQL could learn from one another and how they are in fact complementary despite the broad perception otherwise. We wrapped up the day with a heated-but-friendly Oxford-style debate with Peter Skomoroch (LinkedIn), Michael Driscoll (Metamarkets), DJ Patil (Greylock Partners), Amy Heineike (Quid), Pete Warden (Jetpac), and Toby Segaran (Google). The teams argued whether domain-specific knowledge or pure machine-learning skills mattered most when hiring your first data scientist. Throughout the day, there were other insightful presentations including those from Monica Rogati (LinkedIn) and Matt Biddulph (Product Club). While there were a lot of topics that were over my head, it was a great opportunity to interact with folks who were passionate about data science and are at the top of their craft.
After attending Strata, I began reflecting on the current state of data science: if you are LinkedIn, Twitter, Facebook, Google, Zynga, etc., you have both the data and the prestige to attract, hire, and groom the top data scientists in the industry. But what if you are a telecom, manufacturing, or retail company based on the East coast or in the Midwest? What if you are a small startup based in Silicon Valley that does not yet have the interesting data sets or the name recognition of these hot companies? What are you supposed to do?
Sure, you can attempt to replicate the same big data infrastructure stack as these companies, but do you have the resources and/or the charter to do so (versus focusing on simply keeping your site up and shipping product)? Even if you can build or buy a robust platform for big data analytics, will you be able to attract, train, and retain the highly-skilled data scientists necessary to mine your company’s data to gain important new insights?
We are entering a world of ‘haves’ and ‘have nots’ when it comes to data science. The ‘haves’ view their growing data sets as an asset to be explored for new discoveries (and they have the data scientists to do it). The ‘have nots’ see their data as a liability to be stored only to satisfy their internal data retention policies. They view data exploration as a costly, perilous journey with little upside or reward.
Having worked at both application and infrastructure enterprise software companies, one of the things that excited me most about joining Metamarkets a month or so ago is the tremendous opportunity we have to address the needs of the data science ‘have nots’.
While there is tremendous value in data scientists using our solution for initial data exploration (in the same way you would conduct initial oil exploration before investing the resources and capital for deep sea drilling), I believe our bigger opportunity is in helping those companies and industries that have neither the means, the mission, nor the data scientists to build, maintain, and operate a big data analytics platform on their own.
Our mission is to level the playing field, so that any company – no matter its size, resources, or technology prowess – can view their data as a core asset to be cultivated and mined for increased revenues, improved usage, and better efficiencies. In other words, we think they should love data as much the data scientists we met at Strata.
Metamarkets is on the forefront of delivering data science-as-a-service. ‘Mere mortals’ should be able to converse with their company’s data in a way that is digestible and actionable in support of everyday decisions. While we have some early indications of success, we are just getting started.
To companies everywhere, we say: don’t fear your data, embrace data science.
Love your data.