Dogfooding with Druid, Samza, and Kafka: Metametrics at Metamarkets

Filed in Company, Druid

“Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.” – Kurt Vonnegut Every engineer loves the feeling of standing up a new piece of open source infrastructure, satisfaction born from a grueling journey through community forums, outdated documentation, and mostly-uncommented source code. The glory is fleeting, because not 20 minutes into having your shiny new service up and running, a hiccup hits and you’re forced to ask: what am I going to use to monitor and maintain this thing? We’ve now entered an era where software increasingly runs not on a […]

Dogfooding with Druid, Samza, and Kafka: Metametrics at Metamarkets
Read Post Comments

Five Tips for a F’ing Great Logo

Filed in Corporate, Druid

This post originally appeared on Druid.io on July 23, 2014. Everyone wants a great logo, but it’s notoriously difficult work—prone to miscommunications, heated debates and countless revisions. Still, after three years we couldn’t put it off any longer. Druid needed a visual identity, so we partnered with the talented folks at Focus Lab for help. Our old logo (left) was…lacking. Much better now, right? Despite our fears, we cranked this out with Focus in a speedy three week sprint. Not only was the process drama-free, it was actually fun. The goal of this post is to give you some insight into how […]

Five Tips for a F’ing Great Logo
Read Post Comments

Building a Data Pipeline That Handles Billions of Events in Real-Time

Filed in Corporate, Technology

At Metamarkets our goal is to help our clients make sense of large amounts of data in real-time. Our platform ingests tens of billions of new events every day, and currently comprises trillions of aggregated events. Our real-time analytics platform has two separate yet equally important goals: interactivity (real-time queries) and data freshness (real-time ingestion). We’ve written before about how Druid, our open-source datastore, is able to offer fast, interactive queries. In this post, we’re going to focus on the challenges around achieving data freshness. We’ll talk about the batch-oriented pipelines we started with, and how we approached building real-time […]

Building a Data Pipeline That Handles Billions of Events in Real-Time
Read Post Comments

Open Source Leaders Sound Off on The Rise of the Real-Time Data Stack

Filed in Druid, Technology

In February we were honored to speak at the O’Reilly Strata conference about building a robust, flexible, and completely open source data analytics stack. If you couldn’t make it, you can watch the video here. Preparing for our talk got us thinking about all the brilliant folks working on similar problems, so we organized a panel that same night to continue the conversation. The discussion featured key contributors to several open source technologies: Andy Feng (Storm), Eric Tschetter (Druid), Jun Rao (Kafka), and Matei Zaharia (Spark). It was moderated by VentureBeat Staff Writer Jordan Novet and hosted by Zack Bogue […]

Open Source Leaders Sound Off on The Rise of the Real-Time Data Stack
Read Post Comments

How We Scaled HyperLogLog: Three Real-World Optimizations

Filed in Corporate, Druid, Technology

At Metamarkets, we specialize in converting mountains of programmatic ad data into real-time, explorable views. Because these datasets are so large and complex, we’re always looking for ways to maximize the speed and efficiency of how we deliver them to our clients.  In this post, we’re going to continue our discussion of some of the techniques we use to calculate critical metrics such as unique users and device IDs with maximum performance and accuracy. Approximation algorithms are rapidly gaining traction as the preferred way to determine the unique number of elements in high cardinality sets. In the space of cardinality […]

How We Scaled HyperLogLog: Three Real-World Optimizations
Read Post Comments