Autoscaling is your friend. If you're not leveraging it (multiple availability zones), you're doing it wrong. Even single instances can be launched in autoscaling groups with a desired capacity of 1 to ensure that if it falls over, a new one is spun up.
Point 2: AWS is likely trying to rotate capacity for updates, which means they need to evict instances. That are running on doms that they need to update/deprecate/etc. The longer your instances are running (or the more specialized the type of instance is), the more likely you'll see an eviction notice. It should be part of a good practice to launch new instances often as new AMIs become available, or as private AMIs are updated for security patches, etc. - at least monthly! Autoscaling and solid config management simplifies this practice greatly.
They are not rotating capacity for updates. They are patching a Xen security issue that will be announced on Oct 1. That is why they are rebooting machines and not forcing moves off of those machines. Otherwise, I agree with the advice.
all of our instances scheduled for maint. are indeed system-reboot event types; it is indeed a stop/start situation.
stop/start can possibly put you on another physical - but it all depends on how aws has setup the hypervisors and their instance schedulers.
this, the maint from aws, appears to be security related - but that doesn't mean that aws is not getting folks off of old hardware if they have that desire.
I think it's more complicated than that. Because Amazon is claiming that if you let them handle it, you keep your instance data. That's not a stop-start (at least what we ordinary users can do).
It's more like a system restart with a little downtime managed by them.
You can try a stop-start yourself, but it's not guaranteed to help. And a restart yourself doesn't do anything.
This helps services, but as some point you have to run a database layer too ;) Cassandra helps, but it's not the whole story during large-scale close together reboots like these.
It ends up being a decent bit of manual operator time spent when security patches force AWS to reboot. You have to be pretty careful about any service that can't lose members as quickly due to bootstrap times or technology limitations, e.g. RDBMSes.
For things running in an ASG, it's trivial to let it die or just kill it.
Autoscaling is your friend, but you can also use "auto-healing" if your stack is built on Amazon's AWS OpsWorks and you just want to keep a single instance alive. It will automatically spawn a replacement instance and reattach and mount any EBS volumes.
My understanding is that that doesn't work correctly in the case of AZ failure; the EBS data isn't duped to another AZ and so your instance will fail to come up. So it's not really a solution.
Quite possible. But it depends on what level of disaster you want to protect yourself from. Single EC2 instance termination is much more common than an entire availability zone going down. I'd say OpsWorks auto-healing is better than just running a standalone EC2 instance, and configuring a full auto-scaling setup with your own custom AMIs and boot scripts is even better, but also much more work.
Yes, isn't it wonderful that you have to completely remake your autoscaling groups every time you add/remove/modify an ELB. Thank Cthulhu (or opscode...) for chef.
Autoscaling is your friend. If you're not leveraging it (multiple availability zones), you're doing it wrong. Even single instances can be launched in autoscaling groups with a desired capacity of 1 to ensure that if it falls over, a new one is spun up.
Point 2: AWS is likely trying to rotate capacity for updates, which means they need to evict instances. That are running on doms that they need to update/deprecate/etc. The longer your instances are running (or the more specialized the type of instance is), the more likely you'll see an eviction notice. It should be part of a good practice to launch new instances often as new AMIs become available, or as private AMIs are updated for security patches, etc. - at least monthly! Autoscaling and solid config management simplifies this practice greatly.
Good Luck!