Five Principles for Designing Real-Time Data Pipelines

Filed in Announcement, Technology

In the course of onboarding dozens of clients and petabytes of data to Metamarkets’ real-time solution, our Data Engineering team has learned a handful of best practices for ETL.  That’s why today we’re announcing the release of our Real-Time Integration SDK, a set of open source tools that incorporate these learnings, and make it even easier to get running with Metamarkets.  In the last month alone, it has enabled several clients to accelerate through the complex process of building a real-time data integration at petabyte scale. The SDK includes a Java client library for posting data to our real-time API, […]

Five Principles for Designing Real-Time Data Pipelines
Read Post Comments

Simplicity, stability, and transparency: how Samza makes data integration a breeze

Filed in Technology

Last summer, we blogged about the evolution of our Druid data integration pipelines from batch to real-time. Earlier this year, we went through another change: switching the stream processing component to Samza, largely due to its great operational characteristics. Switching out the stream processor mid-flight involved both development time and operational risk, so I'm glad to report that it turned out to be worth the effort. Even though we made the leap mostly for operational reasons, we also appreciated Samza's simple programming model and its take on state management. Since we’ve now been using it for a few months, I […]

Simplicity, stability, and transparency: how Samza makes data integration a breeze
Read Post Comments

Effect of Frequency Governor on Java Benchmarking

Filed in Technology

A very common tool in a programmer’s arsenal is a MacBook Pro (MBP). However, a major OSX drawback for a developer is the lack of easy, fine grain control over kernel behaviors similar to that found in machines kerneled with Linux or raw BSD. In this post, we will explore the effect of the frequency governor on MBPs running with a modern Intel chip. The wall-time query execution speed in Druid will be used as a simple java benchmark. One of the most common tasks during the course of evaluating code is to look at key bottlenecks in execution time. […]

Effect of Frequency Governor on Java Benchmarking
Read Post Comments

Druid Gets Open Source-ier Under the Apache License

Filed in Corporate, Technology

In the process of building a SaaS solution for our clients, we’ve had to invent a handful of technology components that didn’t previously exist. One of these components is Druid, the streaming data store we’ve been operating, scaling, and building for the past three years. Today, Metamarkets runs a multi-thousand core Druid cluster that manages over 10 trillion events and ingests another 50 billion events daily. Since we open sourced Druid in 2012, it’s become an emerging standard for exploratory analytics on massive data sets. Right now, more than a dozen companies use it across multiple business areas (ad tech, […]

Druid Gets Open Source-ier Under the Apache License
Read Post Comments

Introducing Facet, A Revolutionary User Interface for High-Dimensional Data

Filed in Company, Corporate, Technology

In 2011 Metamarkets launched Druid, a streaming datastore capable of processing billions of events in real-time. Today we’re proud to release Facet, an interface designed to put Druid’s full power at every user’s fingertips. We built Facet to overcome the inflexibility of traditional reporting and analytics interfaces. While many systems are adept at telling you what is happening, Facet helps you uncover why it’s happening — quickly, intuitively and without the assistance of an analyst or Business Intelligence expert. One of Facet’s secrets is in how it handles the high-dimensional datasets common to the programmatic marketing space. In this post, I’ll […]

Introducing Facet, A Revolutionary User Interface for High-Dimensional Data
Read Post Comments