The load balancer itself can always be a single point of failure, no matter how ...

rdl · on July 3, 2012

Normally you Anycast DNS only, and then use DNS load balancing (of IPs) to load balance physical load balancers (which are themselves sharing IPs on subnets and doing high availability stuff) in front of banks of servers.

You probably implement Geo, health, and load based optimization in the DNS layer and also in the physical load balancers. A single host out of ~30 dying at Site A doesn't affect how the DNS load balancing responds, but a critical number does.

Due to broken DNS implementations in the wild, you ideally keep your per-site load balancers up in as many situations as possible (even if load must be shed, you keep core network and load balancers up) and redirect traffic from there to other sites.

For larger sites, the big choice is running a single AS global network where IPs can move between datacenters vs. running each datacenter as a separate network. Advantages and disadvantages to each.

The big disadvantage of all of this is you're now spending a few million a month on salaries, infrastructure (various PoPs, network gear, etc.) before you've bought a single server or served a single page. Not so lean.

zhoutong · on July 3, 2012

> The big disadvantage of all of this is you're now spending a few million a month on salaries, infrastructure (various PoPs, network gear, etc.) before you've bought a single server or served a single page. Not so lean.

So a business opportunity kicks in. That's how the cloud revolutionizes computing.

rdl · on July 3, 2012

The problem I think is that "really hard parts of the front end of your app, as a service" is only really valuable once someone has also provided "really hard parts of the back end of your app (database distribution/replication across multiple sites, ideally with resilience against various failures, and your choice on CAP), as a service", too.

(on the front end, you should probably also be doing CDN+ (caching, WAN TCP acceleration, DoS prevention) too.)

[also, wow, I didn't realize you were the "famous" Zhou Tong.]

zhoutong · on July 3, 2012

Database distribution is definitely a pain for RDBMS. A completely re-invented distributed database system like Cassandra is what's needed. But this simply means that a random blogger can't deploy WordPress on this kind of highly available hosting too.

In any case, AWS is probably closest to this kind of revolution, as they already have different types of managed database services (relational, object, in-memory and key-value). They also have Route 53, which is Anycast-based and CloudFront.

[What's right/wrong/surprising with my identity?]

rdl · on July 3, 2012

AWS does have a shot at building a system like that, but I'd rather it be made of open components from multiple vendors.

[I wasn't following the bitcoin stuff as it happened, and then later read about bitcoinica (we actually use the difficulties with hosting providers you faced as an example with our unlaunched product...). I didn't know you were on HN -- awesome. (you are probably one of the world's experts on problems with service provider internal security, now, although it was expensive education). And handling things by paying everyone out was a much better decision than most startups after a breach.]

zhoutong · on July 3, 2012

Yeah. I hope OpenStack can catch up soon.

[The first line of the comment was intended to keep this on-topic. I didn't really have any financial interest in Bitcoinica at the time of the hack. A major management handover took place three weeks before that and the 100% of company was sold in 2011. The community assumed that I was still the owner but it's simply not true. The hacked mail server (the "root cause") didn't belong to me either. But yes, I learned a lot by being both an insider and an outsider, and these are fortunately free lessons. I still follow some of the valuable experience in dealing with Bitcoinica's infrastructure in my new project, which doesn't deal with money. Startup infrastructure has a big market. Now I use KVM to build a small private cloud for my new project on top of dedicated server(s) simply because I love the flexibility of cloud deployments. If someone brings that to developers who are not sysadmins, it'll be cool.]