More ways to contact us

How We Scaled HyperLogLog: Three Real-World Optimizations

Filed in Corporate, Druid, Technology

At Metamarkets, we specialize in converting mountains of programmatic ad data into real-time, explorable views. Because these datasets are so large and complex, we’re always looking for ways to maximize the speed and efficiency of how we deliver them to our clients.  In this post, we’re going to continue our discussion of some of the techniques we use to calculate critical metrics such as unique users and device IDs with maximum performance and accuracy. Approximation algorithms are rapidly gaining traction as the preferred way to determine the unique number of elements in high cardinality sets. In the space of cardinality […]

How We Scaled HyperLogLog: Three Real-World Optimizations
Read Post Comments

The Art of Approximating Distributions: Histograms and Quantiles at Scale

Filed in Algorithms, Data Visualization, Druid, Technology

I'd like to acknowledge Xavier Léauté for his extensive contributions (in particular, for suggesting several algorithmic improvements and work on implementation), helpful comments, and fruitful discussions.  Featured image courtesy of CERN. Many businesses care about accurately computing quantiles over their key metrics, which can pose several interesting challenges at scale. For example, many service level agreements hinge on these metrics, such as guaranteeing that 95% of queries return in < 500ms. Internet service providers routinely use burstable billing, a fact that Google famously exploited to transfer terabytes of data across the US for free. Quantile calculations just involve sorting the data, which can be […]

The Art of Approximating Distributions: Histograms and Quantiles at Scale
Read Post Comments

Real Real-Time. For Real.

Filed in Druid

Danny Yuan, Cloud System Architect at Netflix, and I recently co-presented at the Strata Conference in Santa Clara. The presentation discussed how Netflix engineers leverage Druid, Metamarkets’ open-source, distributed, real-time, analytical data store, to ingest 150,000 events per second (billions per day), equating to about 500MB/s of data at peak (terabytes per hour) while still maintaining real-time, exploratory querying capabilities. Before and after the presentation, we had some interesting chats with conference attendees. One common theme from those discussions was curiosity around the definition of "real-time" in the real world and how Netflix could possibly achieve it at those volumes. This post is […]

Real Real-Time. For Real.
Read Post Comments

Meet the Druid and Find Out Why We Set Him Free

Filed in Druid

Introduction Before jumping straight into why Metamarkets open sourced Druid, I thought I would give a brief dive into what Druid is and how it came about. For more details, check out the Druid white paper. We are lucky to be developing software in a period of extreme innovation. Fifteen years ago, if a developer or ops person went into his or her boss's office and suggested using a non-relational/non-SQL/non-ACID/non-Oracle approach to storing data, they would pretty much get sent on their way. All problems at all companies were believed to be solved just fine using relational databases. Skip forward a few years and the […]

Meet the Druid and Find Out Why We Set Him Free
Read Post Comments

15 Minutes to Live Druid

Filed in Druid

15 Minutes to Live Druid Big Data reflects today’s world where data generating events are measured in the billions and business decisions based on insight derived from this data is measured in seconds. There are few tools that provide deep insight into both live and stationary data as business events are occurring; Druid was designed specifically to serve this purpose. If you’re not familiar with Druid, it’s a powerful, open source, real-time analytics database designed to allow queries on large quantities of streaming data – that means querying data as it’s being ingested into the system (see previous blog post). […]

15 Minutes to Live Druid
Read Post Comments