Autoscaling Samza with Kafka, Druid and AWS

Filed in Druid, Machine Learning, Technology

At Metamarkets, we are receiving more than 100 billion events per day, totaling more than 100 terabytes. These events are processed in real-time streams, allowing our clients to visualize and dissect them on our interactive dashboards. This data firehose must be managed in a way that is reliable without sacrificing cost efficiency. This post will demonstrate how we have implemented scaling modeling in a turbulent environment to achieve right-sizing of part of our real-time data streams. Our technical stack is based on Kafka, Samza, Spark and Druid and runs on Amazon Web Services. Incoming events are going first to Kafka, […]

Autoscaling Samza with Kafka, Druid and AWS
Read Post Comments

Interactive Analytics Blog Series: Instant Drill-Down

Filed in Data Science, Druid, Technology

Programmatic marketplaces are continuing to change the game in buying and selling of online advertising; the data from these marketplaces is enabling an even bigger transformation in marketing itself, fostering data-informed practices everywhere. That has led to increased adoption of a new breed of analytics designed for the fast-paced programmatic age.   This is the second in a three-part series about how marketers can make the most of their programmatic data by leveraging interactive analytics, with today’s post focusing on the value of instant drill-down. Instant Drill Down: Flexible Data Exploration Any marketer can tell you that looking at static […]

Interactive Analytics Blog Series: Instant Drill-Down
Read Post Comments

Distributing Data in Druid at Petabyte Scale

Filed in Algorithms, Corporate, Data Visualization, Druid, R, Technology

At Metamarkets we run one of the largest production Druid clusters out there, so when it comes to scalability, we are almost always the first ones to encounter issues of running Druid at scale. Sometimes, however, performance problems are much simpler, and the downside of a large cluster is that it tends to average out problems that are hiding in plain sight, making them harder to pinpoint. Recently, we started noticing that, despite being able to scale our cluster almost horizontally, performance would not always increase accordingly. While we don’t expect a linear increase in speed, some of the numbers […]

Distributing Data in Druid at Petabyte Scale
Read Post Comments

Behind the Scenes with Metamarkets, Episode 2

Filed in Corporate, Druid, Technology

In the latest episode in our “Behind The Scenes” video series, we sat down with Dr. Charles Allen, a senior software engineer at Metamarkets and one of the leading developers focused on Druid, the open-source database built by the Metamarkets team. Watch the video to hear how Charles first encountered and implemented Druid before coming to Metamarkets, what he sees as the key strengths of the technology, how he’s worked to implement more mutability of data within Druid, and how he’s helping users get maximum utilization of their clusters.

Behind the Scenes with Metamarkets, Episode 2
Read Post Comments

Druid Query Optimization with FIFO: Lessons from Our 5000-Core Cluster

Filed in Druid, Technology

Druid’s Horizontal Scale A large strength of using Druid as a data store and aggregation engine is its ability to horizontally scale. Whenever more data is in the system, or whenever faster compute times are desired, it is simply a matter of throwing more hardware at the problem, and Druid auto-detects, and auto-balances its workloads. At Metamarkets we are currently ingesting over 3M events/ second (replicated) into our Druid cluster and have multiple hundreds of historical nodes serving this data across multiple tiers. Part of the power of this horizontal scale is how Druid breaks up data into shards. Each […]

Druid Query Optimization with FIFO: Lessons from Our 5000-Core Cluster
Read Post Comments