From the Engineering Frontlines: Rewriting Nested Table

Filed in Technology

Nested table is Facet’s main visualization. It consists of many features that provide various ways to explore the hierarchical data returned by Druid (Druid is an open source, distributed analytics data store — more about Druid here). The first version was built during Facet’s early days, back when it was still called by its internal codename. It had several flaws, the biggest one being that it was relying heavily on D3 for DOM generation without offering enough abstraction between the high levels of Facet (powered by AngularJS) and the D3 bits themselves. This had a significant impact on our ability […]

From the Engineering Frontlines: Rewriting Nested Table
Read Post Comments

Five Principles for Designing Real-Time Data Pipelines

Filed in Announcement, Technology

In the course of onboarding dozens of clients and petabytes of data to Metamarkets’ real-time solution, our Data Engineering team has learned a handful of best practices for ETL.  That’s why today we’re announcing the release of our Real-Time Integration SDK, a set of open source tools that incorporate these learnings, and make it even easier to get running with Metamarkets.  In the last month alone, it has enabled several clients to accelerate through the complex process of building a real-time data integration at petabyte scale. The SDK includes a Java client library for posting data to our real-time API, […]

Five Principles for Designing Real-Time Data Pipelines
Read Post Comments

Simplicity, stability, and transparency: how Samza makes data integration a breeze

Filed in Technology

Last summer, we blogged about the evolution of our Druid data integration pipelines from batch to real-time. Earlier this year, we went through another change: switching the stream processing component to Samza, largely due to its great operational characteristics. Switching out the stream processor mid-flight involved both development time and operational risk, so I'm glad to report that it turned out to be worth the effort. Even though we made the leap mostly for operational reasons, we also appreciated Samza's simple programming model and its take on state management. Since we’ve now been using it for a few months, I […]

Simplicity, stability, and transparency: how Samza makes data integration a breeze
Read Post Comments

Effect of Frequency Governor on Java Benchmarking

Filed in Technology

A very common tool in a programmer’s arsenal is a MacBook Pro (MBP). However, a major OSX drawback for a developer is the lack of easy, fine grain control over kernel behaviors similar to that found in machines kerneled with Linux or raw BSD. In this post, we will explore the effect of the frequency governor on MBPs running with a modern Intel chip. The wall-time query execution speed in Druid will be used as a simple java benchmark. One of the most common tasks during the course of evaluating code is to look at key bottlenecks in execution time. […]

Effect of Frequency Governor on Java Benchmarking
Read Post Comments

Druid Gets Open Source-ier Under the Apache License

Filed in Corporate, Technology

In the process of building a SaaS solution for our clients, we’ve had to invent a handful of technology components that didn’t previously exist. One of these components is Druid, the streaming data store we’ve been operating, scaling, and building for the past three years. Today, Metamarkets runs a multi-thousand core Druid cluster that manages over 10 trillion events and ingests another 50 billion events daily. Since we open sourced Druid in 2012, it’s become an emerging standard for exploratory analytics on massive data sets. Right now, more than a dozen companies use it across multiple business areas (ad tech, […]

Druid Gets Open Source-ier Under the Apache License
Read Post Comments