Benefits of a Full-Stack Solution

Filed in Company, Corporate, Data Visualization, Druid, Our Customers, Technology, Visualization

We have built our ingestion, processing, and in-memory storage engine out at scale specifically for high dimensionality advertising data. With this expertise we have perfected the infrastructure that allows you to visualize billions of daily programmatic events in seconds without having to maintain your own technology. What does that mean for our customers? An end to end solution that is powerful, fast, and reliable. Technical Benefits of Our Full-Stack Solution Real-Time Data Stream Our customers gain insights into their events immediately after they occur. Event records are retained indefinitely so you have access to unify and call real-time and historical […]

Benefits of a Full-Stack Solution
Read Post Comments

Managing a Large-scale Spark Cluster with Mesos

Filed in Best Practices, Druid, Technology

At Metamarkets, we ingest more than 100 billion events per day, which are processed both realtime and batch. We store received events to our Kafka cluster and the stored events in Kafka are processed by both Samza and Spark for real-time stream processing and batch processing, respectively. We have clients who send us data in batch only, but batch processing is done for clients who send us data in real-time in order to fix up any incorrectness in produced data, including deduplicating and joining events that were outside of the real-time join window. Batch processing is a two-step operation where […]

Managing a Large-scale Spark Cluster with Mesos
Read Post Comments

Autoscaling Samza with Kafka, Druid and AWS

Filed in Druid, Machine Learning, Technology

At Metamarkets, we are receiving more than 100 billion events per day, totaling more than 100 terabytes. These events are processed in real-time streams, allowing our clients to visualize and dissect them on our interactive dashboards. This data firehose must be managed in a way that is reliable without sacrificing cost efficiency. This post will demonstrate how we have implemented scaling modeling in a turbulent environment to achieve right-sizing of part of our real-time data streams. Our technical stack is based on Kafka, Samza, Spark and Druid and runs on Amazon Web Services. Incoming events are going first to Kafka, […]

Autoscaling Samza with Kafka, Druid and AWS
Read Post Comments

Interactive Analytics Blog Series: Instant Drill-Down

Filed in Data Science, Druid, Technology

Programmatic marketplaces are continuing to change the game in buying and selling of online advertising; the data from these marketplaces is enabling an even bigger transformation in marketing itself, fostering data-informed practices everywhere. That has led to increased adoption of a new breed of analytics designed for the fast-paced programmatic age.   This is the second in a three-part series about how marketers can make the most of their programmatic data by leveraging interactive analytics, with today’s post focusing on the value of instant drill-down. Instant Drill Down: Flexible Data Exploration Any marketer can tell you that looking at static […]

Interactive Analytics Blog Series: Instant Drill-Down
Read Post Comments

Distributing Data in Druid at Petabyte Scale

Filed in Algorithms, Corporate, Data Visualization, Druid, R, Technology

At Metamarkets we run one of the largest production Druid clusters out there, so when it comes to scalability, we are almost always the first ones to encounter issues of running Druid at scale. Sometimes, however, performance problems are much simpler, and the downside of a large cluster is that it tends to average out problems that are hiding in plain sight, making them harder to pinpoint. Recently, we started noticing that, despite being able to scale our cluster almost horizontally, performance would not always increase accordingly. While we don’t expect a linear increase in speed, some of the numbers […]

Distributing Data in Druid at Petabyte Scale
Read Post Comments