There are so many outages in us-east-1. I've heard the reason is because that's where they roll out maintenance first or something along those lines. Just look at this list of outages on Wikipedia [1] and scan for US-east-1, North Virginia, or "Northeast" (all the same places).
* It's the largest region (ever had an unexpected scaling bug?).
* It has more legacy stuff lying around. For example, old regions have EC2 Classic, while new regions are VPC only.
* There are more customers there. More whales, more use cases.
Most AWS teams explicitly try not to deploy to us-east-1 first, but because us-east-1 is so different on so many dimensions, it is more likely to have issues that dont manifest elsewhere.
I've heard the reason is because that's where they roll out maintenance first
That doesn't make sense - why would they do maintenance in their largest (and oldest) region first? I'd expect them to roll out changes to smaller regions first so problems will affect fewer users.
I think the more likely explanation is that it's their largest (and oldest) region.
an aws tam once told me the same thing. us-east-1a gets the new stuff first. i never validated it against anything other than this one person's statement.
> Between 9:21 AM and 2:36 PM PDT we experienced increased query failures and latency in the US-EAST-1 Region. The issue has been resolved and the service is operating normally.
> The issue with the Data Catalog APIs started with a software update in the US-EAST-1 Region that completed at 9:21 AM PDT. The software update was immediately rolled back[...]
Thankfully the redshift outage was just on APIs, not existing machines. Our cluster was fine today, but external schema which rely on glue/athena did time out.