Simplicity, stability, and transparency: how Samza makes data integration a breeze

Filed in Technology

Last summer, we blogged about the evolution of our Druid data integration pipelines from batch to real-time. Earlier this year, we went through another change: switching the stream processing component to Samza, largely due to its great operational characteristics. Switching out the stream processor mid-flight involved both development time and operational risk, so I’m glad to report that it turned out to be worth the effort. Even though we made the leap mostly for operational reasons, we also appreciated Samza’s simple programming model and its take on state management. Since we’ve now been using it for a few months, I […]

Simplicity, stability, and transparency: how Samza makes data integration a breeze
Read Post Comments

Building a Data Pipeline That Handles Billions of Events in Real-Time

Filed in Corporate, Technology

At Metamarkets our goal is to help our clients make sense of large amounts of data in real-time. Our platform ingests tens of billions of new events every day, and currently comprises trillions of aggregated events. Our real-time analytics platform has two separate yet equally important goals: interactivity (real-time queries) and data freshness (real-time ingestion). We’ve written before about how Druid, our open-source datastore, is able to offer fast, interactive queries. In this post, we’re going to focus on the challenges around achieving data freshness. We’ll talk about the batch-oriented pipelines we started with, and how we approached building real-time […]

Building a Data Pipeline That Handles Billions of Events in Real-Time
Read Post Comments

Open Source Leaders Sound Off on The Rise of the Real-Time Data Stack

Filed in Druid, Technology

In February we were honored to speak at the O’Reilly Strata conference about building a robust, flexible, and completely open source data analytics stack. If you couldn’t make it, you can watch the video here. Preparing for our talk got us thinking about all the brilliant folks working on similar problems, so we organized a panel that same night to continue the conversation. The discussion featured key contributors to several open source technologies: Andy Feng (Storm), Eric Tschetter (Druid), Jun Rao (Kafka), and Matei Zaharia (Spark). It was moderated by VentureBeat Staff Writer Jordan Novet and hosted by Zack Bogue […]

Open Source Leaders Sound Off on The Rise of the Real-Time Data Stack
Read Post Comments