Making the Most of EBS Volumes with Reattachment
July 29th, 2016 Jeff Strunk
Here at Metamarkets, we are always seeking to use our AWS resources more efficiently and reduce human intervention required to address failures. We run some large workloads that utilize a lot of EBS volumes and in some cases spot instances. To facilitate this, we’ve written ER, a system to enable our instances to attach, detach, and create EBS volumes at boot without relying on any external coordinator other than the EC2 API.
Pain Point: Kafka Brokers
We run a few Kafka clusters where we use EBS volumes that allow us to replace instances without losing days worth of real- time customer data. Prior to implementing ER, managing dead instances was a very manual process. Now we only need human intervention when volumes fail.
Pain Point: Improve Spot Efficiency and Reliability
Metamarkets’ Druid cluster keeps multiple copies of data in different tiers for high availability and to have enough CPUs to meet processing needs. We found that we could meet our availability requirements and save money by running a tier in the EC2 Spot market. A big concern was how long it would take to reload the segments from S3 when a replacement instance comes up. If we store the segments on a reusable EBS volume, then we reduce recovery time from hours to minutes.
We can also use different numbers of volumes together to match storage with different CPU and memory capacities of different instances types.
How Does It Work?
We define a pool of volumes for each cluster. All of the EBS volumes are tagged with the pool name. We configure ER on the instances by passing settings in the Userdata. The most important settings are the pool name, volume size, volume type, and how many volumes to attach. We can also configure filesystem creation parameters, and mount parameters.
All instances are launched with an IAM instance profile that gives them the following permissions:
When an instance starts up, ER gets a list of all volumes in the same AZ as the instance, with the matching pool tag, no blacklist tag, and with a status of “available.” ER loops through the available volumes and attempts to attach, tag, quickly verify the filesystem status, and mount as many volumes as are desired in the configuration. If no existing volumes are suitable, then it will create volumes, new filesystems, and mount them.
On shutdown, a script unmounts and detaches all EBS volumes. On spot instances, we call that script with our spot instance termination notice handler.
- Possible Race Conditions – EC2 prevents two or more instances from attaching the same volume at the same time. One will succeed, and the rest will fail. If ER fails to attach a volume, it updates its list and tries a new one.
- Corrupt Filesystems – ER runs a filesystem check with a 60 second timeout to make sure the filesystem is in a good state. If there are any problems with the check or mounting, ER tags the volume with a blacklist tag containing the error message for a human to handle, detaches it, and tries another volume.
- EC2 API Rate Limiting – ER handles rate limit and throttling errors with exponential backoff and retries.
At this time, we only support pools of volumes in one AZ. In order to support multiple AZs, we could snapshot volumes in other AZs and create a new volume in a new AZ if none were available where the instance is running. We plan to do something similar to manage pools of ENIs for services that do not handle changing IP addresses so gracefully.