I'd like to acknowledge Xavier Léauté for his extensive contributions (in particular, for suggesting several algorithmic improvements and work on implementation), helpful comments, and fruitful discussions. Featured image courtesy of CERN. Many businesses care about accurately computing quantiles over their key metrics, which can pose several interesting challenges at scale. For example, many service level agreements hinge on these metrics, such as guaranteeing that 95% of queries return in < 500ms. Internet service providers routinely use burstable billing, a fact that Google famously exploited to transfer terabytes of data across the US for free. Quantile calculations just involve sorting the data, which can be […]
July 9th, 2013 • Chris Allison
Filed in Corporate, Technology
With hundreds of ad-tech companies competing to gain market share and seeking to distinguish themselves, product managers are constantly weighing the costs and benefits of introducing new product features. Often, the challenge isn’t how to build the functionality, but rather how to prioritize items on their roadmaps and deploy their resources. In many cases, a product feature will fall out of scope of the core mission of the enterprise, so managers ultimately need to decide on whether they should outsource a feature or dedicate valuable manpower to build it internally. With limited resources and bandwidth, analytics-based product enhancements always seem […]
October 24th, 2012 • Eric Tschetter
Filed in Announcement, Druid, Technology
In April 2011, we introduced Druid, our distributed, real-time data store. Today I am extremely proud to announce that we are releasing the Druid data store to the community as an open source project. To mark this special occasion, I wanted to recap why we built Druid, and why we believe there is broader utility for Druid beyond Metamarkets’ analytical SaaS offering.
September 21st, 2012 • Fangjin Yang
Filed in Algorithms, Druid, Technology
The Metamarkets solution allows for arbitrary exploration of massive data sets. Powered by Druid, our in-house distributed data store and processor, users can filter time series and top list queries based on Boolean expressions of dimension values. Given that some of our dataset dimensions contain millions of unique values, the subset of things that may match a particular filter expression may be quite large. To design for these challenges, we needed a fast and accurate (not a fast and approximate) solution, and we once again found ourselves buried under a stack of papers, looking for an answer.