Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A couple of points:

Autoscaling is your friend. If you're not leveraging it (multiple availability zones), you're doing it wrong. Even single instances can be launched in autoscaling groups with a desired capacity of 1 to ensure that if it falls over, a new one is spun up.

Point 2: AWS is likely trying to rotate capacity for updates, which means they need to evict instances. That are running on doms that they need to update/deprecate/etc. The longer your instances are running (or the more specialized the type of instance is), the more likely you'll see an eviction notice. It should be part of a good practice to launch new instances often as new AMIs become available, or as private AMIs are updated for security patches, etc. - at least monthly! Autoscaling and solid config management simplifies this practice greatly.

Good Luck!



They are not rotating capacity for updates. They are patching a Xen security issue that will be announced on Oct 1. That is why they are rebooting machines and not forcing moves off of those machines. Otherwise, I agree with the advice.


Could you please confirm or provide evidence for such speculation?

I don't see anything about this on google.

I heard it is because they are having power issues within their datacenter.


May be this: http://xenbits.xen.org/xsa/

XSA-108 2014-10-01 12:00 (Prereleased, but embargoed)


Cool speculation!

Anyone with anything concrete?

I still heard it is power issues.


Not sure how power issues would affect every single region. Logic dictates it's likely a security issue.


I'd just like to point out that you asked for concrete, then speculated based on something you heard.


Aren't you speculating as well?


Restarting an instance (not stop-start) doesn't change the hardware you're on, so I don't think this has a physical explanation.


all of our instances scheduled for maint. are indeed system-reboot event types; it is indeed a stop/start situation.

stop/start can possibly put you on another physical - but it all depends on how aws has setup the hypervisors and their instance schedulers.

this, the maint from aws, appears to be security related - but that doesn't mean that aws is not getting folks off of old hardware if they have that desire.


I think it's more complicated than that. Because Amazon is claiming that if you let them handle it, you keep your instance data. That's not a stop-start (at least what we ordinary users can do).

It's more like a system restart with a little downtime managed by them.

You can try a stop-start yourself, but it's not guaranteed to help. And a restart yourself doesn't do anything.


within their datacenter? Wtf? Do you have any idea how big AWS is?


This helps services, but as some point you have to run a database layer too ;) Cassandra helps, but it's not the whole story during large-scale close together reboots like these.

It ends up being a decent bit of manual operator time spent when security patches force AWS to reboot. You have to be pretty careful about any service that can't lose members as quickly due to bootstrap times or technology limitations, e.g. RDBMSes.

For things running in an ASG, it's trivial to let it die or just kill it.


Each AZ looks to me like it is a day apart. Surely that is enough time.


Autoscaling is your friend, but you can also use "auto-healing" if your stack is built on Amazon's AWS OpsWorks and you just want to keep a single instance alive. It will automatically spawn a replacement instance and reattach and mount any EBS volumes.


My understanding is that that doesn't work correctly in the case of AZ failure; the EBS data isn't duped to another AZ and so your instance will fail to come up. So it's not really a solution.


Quite possible. But it depends on what level of disaster you want to protect yourself from. Single EC2 instance termination is much more common than an entire availability zone going down. I'd say OpsWorks auto-healing is better than just running a standalone EC2 instance, and configuring a full auto-scaling setup with your own custom AMIs and boot scripts is even better, but also much more work.


Fair enough. Not saying it's not usable for some stuff, by any means. (Though OpsWorks in general leaves me feeling a little itchy.)


Yes, isn't it wonderful that you have to completely remake your autoscaling groups every time you add/remove/modify an ELB. Thank Cthulhu (or opscode...) for chef.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: