The Metamarkets solution allows for arbitrary exploration of massive data sets. Powered by Druid, our in-house distributed data store and processor, users can filter time series and top list queries based on Boolean expressions of dimension values. Given that some of our dataset dimensions contain millions of unique values, the subset of things that may match a particular filter expression ...
Posts by Fangjin Yang
Maximum Performance with Minimum Storage: Data Compression in Druid
September 21st, 2012 • Fangjin Yang
Filed in Algorithms, Druid, Technology
Fast, Cheap, and 98% Right: Cardinality Estimation for Big Data
May 4th, 2012 • Fangjin Yang
Filed in Algorithms, Druid
The nascent era of big data brings new challenges, which in turn require new tools and algorithms. At Metamarkets, one such challenge focuses on cardinality estimation: efficiently determining the number of distinct elements within a dimension of a large-scale data set. Cardinality estimations have a wide range of applications from monitoring network traffic to data mining. If leveraged correctly, these ...