Danny Yuan, Cloud System Architect at Netflix, and I recently co-presented at the Strata Conference in Santa Clara. The presentation discussed how Netflix engineers leverage Druid, Metamarkets’ open-source, distributed, real-time, analytical data store, to ingest 150,000 events per second (billions per day), equating to about 500MB/s of data at peak (terabytes per hour) while still maintaining real-time, exploratory querying capabilities. Before and after ...
Filed in “Druid”
Meet the Druid and Find Out Why We Set Him Free
April 26th, 2013 • Steve Harris
Filed in Druid
Introduction Before jumping straight into why Metamarkets open sourced Druid, I thought I would give a brief dive into what Druid is and how it came about. For more details, check out the Druid white paper. We are lucky to be developing software in a period of extreme innovation. Fifteen years ago, if a developer or ops person went into his or her boss's office ...
15 Minutes to Live Druid
April 3rd, 2013 • Jaypal Sethi
Filed in Druid
15 Minutes to Live Druid Big Data reflects today’s world where data generating events are measured in the billions and business decisions based on insight derived from this data is measured in seconds. There are few tools that provide deep insight into both live and stationary data as business events are occurring; Druid was designed specifically to serve this purpose. ...
Introducing Druid: The Real-Time Analytics Data Store
October 24th, 2012 • Eric Tschetter
Filed in Announcement, Druid, Technology
In April 2011, we introduced Druid, our distributed, real-time data store. Today I am extremely proud to announce that we are releasing the Druid data store to the community as an open source project. To mark this special occasion, I wanted to recap why we built Druid, and why we believe there is broader utility for Druid beyond Metamarkets’ ...
Maximum Performance with Minimum Storage: Data Compression in Druid
September 21st, 2012 • Fangjin Yang
Filed in Algorithms, Druid, Technology
The Metamarkets solution allows for arbitrary exploration of massive data sets. Powered by Druid, our in-house distributed data store and processor, users can filter time series and top list queries based on Boolean expressions of dimension values. Given that some of our dataset dimensions contain millions of unique values, the subset of things that may match a particular filter expression ...