Managing a Large-scale Spark Cluster with Mesos

Filed in Best Practices, Druid, Technology

At Metamarkets, we ingest more than 100 billion events per day, which are processed both realtime and batch. We store received events to our Kafka cluster and the stored events in Kafka are processed by both Samza and Spark for real-time stream processing and batch processing, respectively. We have clients who send us data in batch only, but batch processing is done for clients who send us data in real-time in order to fix up any incorrectness in produced data, including deduplicating and joining events that were outside of the real-time join window. Batch processing is a two-step operation where […]

Managing a Large-scale Spark Cluster with Mesos
Read Post Comments