Danny Yuan, Cloud System Architect at Netflix, and I recently co-presented at the Strata Conference in Santa Clara. The presentation discussed how Netflix engineers leverage Druid, Metamarkets’ open-source, distributed, real-time, analytical data store, to ingest 150,000 events per second (billions per day), equating to about 500MB/s of data at peak (terabytes per hour) while still maintaining real-time, exploratory querying capabilities. Before and after ...
Posts by Eric Tschetter
Introducing Druid: The Real-Time Analytics Data Store
October 24th, 2012 • Eric Tschetter
Filed in Announcement, Druid, Technology
In April 2011, we introduced Druid, our distributed, real-time data store. Today I am extremely proud to announce that we are releasing the Druid data store to the community as an open source project. To mark this special occasion, I wanted to recap why we built Druid, and why we believe there is broader utility for Druid beyond Metamarkets’ ...
Scaling the Druid Data Store
January 19th, 2012 • Eric Tschetter
Filed in Druid, Technology
"Give me a lever long enough... and I shall move the world" — Archimedes Parallelism is computing’s leverage, a force multiplier acting against the weight of big data. Cloud-hosted, horizontally scalable systems have the power to move even planetary sized data sets with speed. This blog post discusses our efforts to lift one such data set, achieving a scan rate ...
Druid, Part Deux: Three Principles for Fast, Distributed OLAP
May 20th, 2011 • Eric Tschetter
Filed in Druid, Technology
In a previous blog post we introduced the distributed indexing and query processing infrastructure we call Druid. In that post, we characterized the performance and scaling challenges that motivated us to build this system in the first place. Here, we discuss three design principles underpinning its architecture. 1. Partial Aggregates + In-Memory + Indexes => Fast Queries We work with ...
Introducing Druid: Real-Time Analytics at a Billion Rows Per Second
April 30th, 2011 • Eric Tschetter
Filed in Druid, Technology
Here at Metamarkets we have developed a web-based analytics console that supports drill-downs and roll-ups of high dimensional data sets – comprising billions of events – in real-time. This is the first of two blog posts introducing Druid, the data store that powers our console. Over the last twelve months, we tried and failed to achieve scale and speed with ...