I'm not at all against horizontally scaling. However, I don't believe that horizontally scaling should be necessary doing a mere 200 updates to per second to a data store that isn't even fsyncing writes to disk.
Think of it in terms of cost per ops. Let's just say 200 update ops per second is the point at which you need to shard (not scientific, but let's just use that as a benchmark since that is what we saw at Kiip). MongoDB likes memory, so let's use high-memory AWS instances as a cost benchmark. I think this is fair since MongoDB advertises itself as a DB built for the cloud. The cheapest high-memory instance is around $330/month.
That gives you a cost per op of 6.37e-5 cents per update operation.
Let's compare this to PostgreSQL, which we've had in production for a couple months at Kiip now. Our PostgreSQL master server has peaked at around 1000 updates per second without issue, and also with the bonus that it doesn't block reads for no reason. The cost per op for PostgreSQL is 1.27e-5 cents.
Therefore, if you're swimming in money, then MongoDB seems like a great way to scale. However, we try to be more efficient with our infrastructure expenditures.
I like your post, and I agree with your conclusion, but I have to say I'm puzzled by your decision to back MongoDB with EBS. Were you running MongoDB atop EC2 instances as well? Can you elaborate on this a little?
We were running running MongoDB atop EC2 instances. We chose to back MongoDB with EBS because that was the only reasonable way to get base backups (via snapshots) of the database. Although 10gen recommends using replica sets for backup, we also wanted a way to durably backup our database since there was so much important data in it (user accounts, billing, and so on).
On the other hand, we run PostgreSQL straight on top of a RAID of ephemeral drives, which has had good throughput compared to EBS so far. The reason we're able to do this is because PostgreSQL provides a mechanism for doing base backups safely without having to snapshot the disk[1]. Therefore, we just do an S3 base backup of our entire data (which uploads at 65 MB/s from within EC2) fairly infrequently, while doing WAL shipping very frequently.
You can either do LVM snapshots (with journaling) on the ephemeral drives, or use mongodump with the oplog option to get consistant "hot" backups. The downside of mongodump is it churns your working set.
Interesting. Thanks for the reply and breaking it down the way you did. That provides some serious food for thought. Looking forward to your next post on the rationales for the other data stores.
Can you give any sort of indication of the value of a schemaless database and the flexibility it provided as the team fleshed out the data model? Was this a mere convenience over traditional schema migration or something more?
I'm not at all against horizontally scaling. However, I don't believe that horizontally scaling should be necessary doing a mere 200 updates to per second to a data store that isn't even fsyncing writes to disk.
Think of it in terms of cost per ops. Let's just say 200 update ops per second is the point at which you need to shard (not scientific, but let's just use that as a benchmark since that is what we saw at Kiip). MongoDB likes memory, so let's use high-memory AWS instances as a cost benchmark. I think this is fair since MongoDB advertises itself as a DB built for the cloud. The cheapest high-memory instance is around $330/month.
That gives you a cost per op of 6.37e-5 cents per update operation.
Let's compare this to PostgreSQL, which we've had in production for a couple months at Kiip now. Our PostgreSQL master server has peaked at around 1000 updates per second without issue, and also with the bonus that it doesn't block reads for no reason. The cost per op for PostgreSQL is 1.27e-5 cents.
Therefore, if you're swimming in money, then MongoDB seems like a great way to scale. However, we try to be more efficient with our infrastructure expenditures.
EDIT: Updated numbers, math is hard.