Druid and Spark Together – Mixing Analytics Workflows

Filed in Druid, Technology

One method of looking at the human thought process is that we have different brain workflows for different analytics and data processing needs. At Metamarkets we have the same thing for our data processing machines. This post will explore some of our experience with bin-packing query nodes featuring Druid with batch processing featuring Spark, using Apache Mesos as the resource coordinator. Thinking Fast The above shows a typical “work” pattern for a Druid historical node throughout a typical week day (times shown are EDT). It shows the quantity of CPU seconds consumed by the JVM to answer queries. There are […]

Druid and Spark Together – Mixing Analytics Workflows
Read Post Comments

Behind the Scenes of our Transition to a Multi-Cloud Environment

Filed in Algorithms, Company, Druid, Technology

Service uptime is the performance metric that determines operational success and when something fails, the impact can be far reaching, often affecting a business’s bottom line. One of the downsides of running infrastructure in a public cloud is that we are dependent on the SLAs provided by our Cloud Providers. As a startup, we have been upgrading our systems to become a lot more fault-tolerant, but since our cloud infrastructure footprint is restricted to one region, and the oldest region of AWS at that, we are vulnerable to be bitten by cloud service blackouts or brownouts. The most prominent solution […]

Behind the Scenes of our Transition to a Multi-Cloud Environment
Read Post Comments

Moving Real-Time Data Flow Across Cloud Providers

Filed in Algorithms, Data Science, Druid, Technology

Eventually in the course of data growth, a company needs to make a major migration of data or processes from one physical location to another. This post is the story of how we moved a real-time data flow across cloud providers using Kafka, Samza, and some creative engineering. History Our technology stack for data processing is something we’ve spoken about before. We run a Lambda architecture with the real-time system comprising Kafka and Samza, which terminates in Druid real-time indexing tasks. The batch system is comprised of Spark, which reads and writes from S3. Druid historical nodes use S3 as […]

Moving Real-Time Data Flow Across Cloud Providers
Read Post Comments

Going Multi-Cloud with AWS and GCP: Lessons Learned at Scale

Filed in Algorithms, Druid, Industry, Technology

Metamarkets handles a lot of data. The torrent of data that clients send to us surpasses a petabyte a week. At this scale, the ability to failover gracefully, to detect and eliminate brownouts, and to efficiently operate huge quantities of byte-banging machines is necessary. We started and grew Metamarkets in AWS’s us-east region. And the majority of our footprint was in a single availability zone (AZ). As we grew, we started to see the side effects of being restricted to one AZ, then the side effects of being restricted to one region. It’s kind of like inflating a balloon in […]

Going Multi-Cloud with AWS and GCP: Lessons Learned at Scale
Read Post Comments

Benefits of a Full-Stack Solution

Filed in Company, Corporate, Data Visualization, Druid, Our Customers, Technology, Visualization

We have built our ingestion, processing, and in-memory storage engine out at scale specifically for high dimensionality advertising data. With this expertise we have perfected the infrastructure that allows you to visualize billions of daily programmatic events in seconds without having to maintain your own technology. What does that mean for our customers? An end to end solution that is powerful, fast, and reliable. Technical Benefits of Our Full-Stack Solution Real-Time Data Stream Our customers gain insights into their events immediately after they occur. Event records are retained indefinitely so you have access to unify and call real-time and historical […]

Benefits of a Full-Stack Solution
Read Post Comments