MongoDB 1.8 (stable) released

dstorrs · on March 16, 2011

Could someone with Mongo experience help me gut-check this?

I want my data store to be durable and unsurprising -- barring a hardware failure or such, if I submit data it should either tell me that it failed to commit or it should be stored durably and without surprises (e.g., it should not truncate a long string to fit).

I've read some of the Mongo docco, and it's pretty exciting, but the lack of ACID -- primarily the Durability -- has kept me from really using it.

With a WAL journal, it sounds like maybe the durability issue is fixed. Is it? Could I use Mongo with relatively out-of-the-box settings plus --journal and count on a level of durability equivalent to a traditional RDBMS?

dmytton · on March 16, 2011

Yes, if you combine journaling with safe write operations. This allows you to call the appropriate driver method to confirm that the data has been written. You can also wait for it to be written to n slaves in replication. I believe that "written" means either to the data files or the journal. In the case of the journal, a hard crash would result in the journal being replayed so that the data is then written to disk and you don't get corruption.

For example in the PHP driver, calling insert with the safe option. http://www.php.net/manual/en/mongocollection.insert.php

"If safe is an integer, will replicate the insert to that many machines before returning success (or throw an exception if the replication times out, see wtimeout)."

You can then immediately called http://www.php.net/manual/en/mongodb.lasterror.php to confirm the last operation didn't error.

joelhaasnoot · on March 16, 2011

Don't know, I'm going to be testing this tomorrow, but came to work today to a server that was grinding to a halt due to wild running node.js processes. Restarted the server, but that doesn't cleanly shutdown Mongo, and spent an hour repairing everything and setting permissions right (somehow they had gotten reset). This exact problem, so hope the --journal switch in the init file makes the difference.

joelhaasnoot · on March 17, 2011

Added "journal = true" to the config and it certainly stops and starts the service nicely now. No more repairs and such, my data is currently query only, so it's mainly "optical", cleaning up locks and such.

kchodorow · on March 16, 2011

Journaling should give you the crash-safety you're looking for. You should combine it with safe writes to get the commit safety you want (not the default, but is easy to choose, see your driver's documentation).

The mailing list is a great place for this type of question, too (http://groups.google.com/group/mongodb-user).

runevault · on March 16, 2011

My understanding is this release is the first stable with single node durability available as an option, which is why I'm considering picking Mongo back up.

loxs · on March 17, 2011

I am aware that I may be opening the flame portal, but what you want is called CouchDB. No more and no less.

rb2k_ · on March 16, 2011

Can't wait until they start implementing filtered indexes ( http://jira.mongodb.org/browse/SERVER-785 ). Sparse indexes are a step in the right direction, but filtered ones would be just a bit cooler :)

"New map/reduce options for incremental updates" would also be really cool if they had a way to do something like couchDBs incremental views. This would require keeping track of changes or a "trigger" functionality that runs the m/r task after every x inserts

rbranson · on March 16, 2011

Starting to get excited once again about MongoDB. I was kind of down about it after having some issues with real world implementations. Considering journaling is something I would never have thought would have made it in, I wonder if they will come around on the memory mapped I/O like everyone else eventually does.

EDIT: Also... does the group commit mean that ALL write transactions will be un-acknowledged to the client until the group commit finishes?

tmountain · on March 16, 2011

I'm curious what you're getting at IRT memory mapped I/O. To me, one of Mongo's selling points is the way it lets the OS manage caching for you instead of bothering you with tuning a bunch of buffers and things along those lines. For the sake of disclosure, I'm running 6 servers in a pretty high volume cluster, and while I've had to address some issues here and there, memory management hasn't been one of them.

dmytton · on March 16, 2011

"You can wait for group commit acknowledgement with the getLastError command. When running with --journal, the fsync:true option returns after the data is physically written to the journal (rather than actually fsync'ing all the data files). Note that the group commit interval (see above) is considerable: you may prefer to call getLastError without fsync, or with a w: parameter instead with replication. In releases after 1.8.0 the delay for commit acknowledgement will be shorter."

http://www.mongodb.org/display/DOCS/Journaling

rbranson · on March 16, 2011

RTFM'ing myself in public. While it's a great step forward, the whole thing sounds pretty immature at this point. It'll be a good day when this stuff matures and MongoDB has fsync:true and --journal by default, much to the chagrin of sensationalists everywhere.

They'll really need to add some major smarts into the journaling and group commit if they're going to be able to stack up concurrent I/Os to help feed big disk arrays and even to get the best use out of SSDs which make back-and-forth latency on the I/O pipes even more significant.

mathias_10gen · on March 16, 2011

Clients are able to wait for the next group commit by adding fsync:true to getLastError calls (some drivers allow you to add this to WriteConcern). We already have some enhancements to this planned for the 1.9 series.

davidw · on March 16, 2011

MongoDB is not something I have a good handle on yet. What's its sweet spot? Where should I consider using it instead of Postgres?

kchodorow · on March 16, 2011

A lot of people like it because it makes development faster. It's like the scripting language of databases: you can get stuff out the door really fast (with the obvious power/responsibility caveats).

DonnyV · on March 16, 2011

This was the main reason why I picked it for a project. Its just sick the amount of code you DON'T have to write to use this database. I literally have 1 class called GenericCRUD with 5 functions that do all my database functions for all my models. Plus not having to worry about stored procedures anymore is a heavy weight lifted off your shoulders.

chadcf · on March 17, 2011

I've used it once, for an app that holds data which was not a good fit for a traditional relational database. The app in question essentially involved collecting data in a web app and then using that data to fill out hundreds of PDF forms. It gets really complicated as the data has to be potentially formatted (say a phone number might need to be split 555-555-5555 on one form but one number per square on another), concatenated (name might need to join first, last mi), as well as data about what page and x/y coordinates things go on for each form.

Initial attempts in SQL were painful. The only real way to do it was a key value table, but that gets painful when it comes to formatting for web presence (notably, each document has sections with a group of fields, plus some fields may need to be grouped together such as a series of checkboxes, or parts of a name). So at that point we're looking at writing up XML files to describe the presentation of these 200 forms from a key/value table to the web app.

At that point I realized this was doable, but going to be a mess. Enter mongo. Mongo essentially let's us store a dynamic schema of documents. For each form we can stick it all in a single document, as a series of embedded models, with all metadata and values needed in one go. We also get nice revision control within that using mongoid. We can now fetch all the data for a form, as well as save all the data for the form, in one VERY fast atomic operation (we're talking 100-800 field definitions for each form). Having never used mongo, it only took me a few days to implement this complete with handling for all field types and performance was fantastic.

Mongo also made it quite easy to populate our data since we're essentially just storing a tree of key and values. We wrote up a tool that loads up the PDF's and let's us draw boxes on top of the fields and set up the metadata, then export that to a YAML file for each form. The YAML is then stored in a tradiational SQL database and is used to create a new form in the system by simply converting it to a nested hash and having mongoid save it. Slick.

I'm getting a bit wordy here, but I think it's a great real world example of the type of problem mongo is a good fit for. I wouldn't personally use mongo for something that a relational database is a good fit for, but for something like this it allows you to solve the problem quicker and with significantly less code to maintain (really, the CRUD code for forms is no more than with SQL and probably less since it's only one operation on a document, and my pdf form generator is < 200 lines of ruby).

wladimir · on March 17, 2011

It's schema-less, and as the data format is a binary form of JSON, you can store arbitrary, even nested data structures directly. Indexes can be put on fields deep within the structure.

This saves a lot of time you'd normally spent defining schemas, and is very flexible. It didn't completely replace SQL for me, but it's a good fit for the heterogeneous free-form data generally encountered on the web.

scorpioxy · on March 16, 2011

I use it for offline operations since we only have a single overloaded MS SQL database server. So i sync the data to my local mongo instance creating documents with only the fields that i need. And i have scripts written to do analytics and partition the data to reveal patterns and generate reports and so on.

It's very fast and the flexible schema makes the code much more flexible and easy to write. And did i mention it was fast?

You should definitely give it a try and consider using it for such systems.

My only issue with it is that i am running it on a 32-bit system and so i'm limited to 2GB a database.

flourophore · on March 16, 2011

If you are in the NYC area, you should check out the MongoDB track at PgEast next week to find out ;) https://www.postgresqlconference.org/

andrewjshults · on March 16, 2011

If you are working with location data, Mongo has built in geospatial indexing built in since 1.4 (earlier in the unstable builds) - http://www.mongodb.org/display/DOCS/Geospatial+Indexing which has been a big draw for a number of people I know using it (it looks like 1.8 brings spherical distances to the stable branch which makes the geo lookups a lot more useful if you need accurate distances and not just near by lookups).

look_lookatme · on March 16, 2011

In reference to the OP, when comparing Mongo and Postgres, I wouldn't bring up geo...

igrekel · on March 16, 2011

Interesting I was planning to toy with location data, do you know how it compares with Postgresql extension PostGIS?

crux_ · on March 16, 2011

If by "location data" you have points, and the only location operations you need are distance or bounding-box searches, it may do the trick.

If you're interested in polygons, lines, etc; more physically accurate (and completely implemented) distance queries, spatial joins, aggregation, 3D and surveyor-annotated data, set-theoretic operations ... PostGIS is far and away the way to go. It's far more mature and debugged than any of the NoSQL geospatial stuff I've seen, not only WRT correctness but also performance.

As a point of reference: There's a growing legion of geographers who do all their vector work in SQL using PostGIS.

All that said, for some applications, being tied to the relational model is a deal breaker. Just know that in terms of capability and maturity on the geospatial front, you'll be trading off a Cadillac for a partially assembled rocket sled.

crux_ · on March 16, 2011

I got excited and looked it up.

> We don't currently handle wrapping at the poles or at the transition from -180° to +180° longitude, however we detect when a search would wrap and raise an error.

generalized grumble

Why does everyone always seem to punt on doing geospatial right? It's not _that_ hard.

gt384u · on March 17, 2011

Why not go ahead and help them? I'm sure they'd be happy to have the assistance. Fork mongo from github at https://github.com/mongodb/mongo . Happy hacking!

crux_ · on March 17, 2011

You know, nobody ever replies to these comments with an "I will", but I'm seriously considering it.

(It's a nice opportunity to publicly show off a specialty/core competency and brush up a bit on C++ a the same time. I'm not that easily provoked into action by internet commentary! ;) )

But, looking at the source, I think I will be probably a Bad Contributor and end up with a gigantic pull request and a (mostly) full re-implementation...

thibaut_barrere · on March 17, 2011

> Why does everyone always seem to punt on doing geospatial right? It's not _that_ hard.

Do you mean you think they don't know how to do it?

As I followed the roadmap on this specific point, it looks more like an incremental development to me: they first used rectangular coordinates in 1.6, then a spherical model in 1.7 etc.

It allows to bring a more lightweight solution quickly to people that need it (like me), then to evolve based on the feedback etc.

crux_ · on March 17, 2011

The problem I have is at least partly one of truth in advertising.

For example: If they were truly using a "spherical model", then one would not expect to have queries fail at the poles & dateline, would you?

At least it is documented and fails hard with an error rather than giving wrong results, so a developer can quickly figure out the weak spots --- though I bet a lot of people would prefer the wrong results to queries that cause exceptions in their systems.

> Do you mean you think they don't know how to do it?

I think it has more to do with the absurdly low bar they've set for themselves to check the "geospatial" box than it does with competence.

thibaut_barrere · on March 17, 2011

I don't know really - if limitations are documented like they are apparently, it's really not an issue for me - I don't feel cheated (but it's really an opinion!).

Your mileage may vary as they say: I used the GIS since 1.6 and it was very helpful for me in this form already :)

crux_ · on March 17, 2011

The limitations are documented as foot/side notes.

Analogy: It's like seeing "ACID compliance!" on a feature list, then finding buried in the documentation that is only the case for single-document transactions in unordered collections on a single machine only.

The new feature might be useful to some but including it on a feature list without disclaimer is misleading.

thibaut_barrere · on March 17, 2011

I understand your analogy really :)

Really curious: are you using mongo currently? Or browsing the docs?

crux_ · on March 17, 2011

I'm not using it directly, but sometimes I use tools that in turn use mongo. -- So for now, browsing docs.

joegester · on March 17, 2011

My experience has been that many applications (web apps in particular) use a relational database to break up documents into an SQL schema so they can be indexed, then assemble them again when needed. Mongo really ratchets down the friction on that operation. Instead of spending time building a schema and writing lots and lots of insert and update statements, you just build a JSON object and send it over.

thibaut_barrere · on March 17, 2011

For me the sweet spot is its flexibility.

I work a lot on data aggregation, where I can create a bunch of tables each day, then maintain them etc. For me it's almost a dream really :)

As well they are adding features such as geonear that makes it appealing for other uses (which I have, too).

yawniek · on March 16, 2011

some of the sweet spots where it _might_ be better in than postgres are:

- horizontal scalability

- flexible datastructures

- map/reduce

rbranson · on March 16, 2011

It is good at web scale.

rch · on March 16, 2011

This is exciting: db.users.mapReduce(map, reduce, {out: { inline : 1}});

This is Not exciting:

"Note that this option is possible only when the result set fits within the 16MB limit of a single document."

rit · on March 16, 2011

I did a writeup, for what it's worth, on the new MapReduce output options:

http://blog.evilmonkeylabs.com/2011/01/27/MongoDB-1_8-MapRed...

(Disclaimer, I work for 10gen / MongoDB)

rch · on March 16, 2011

Very nice - worth the reading. I'm particularly interested in seeing how you've implemented 'merge' and 'reduce' output.

You guys do seem to be headed in the right direction technically... I just can't bring myself to say "mongo" out loud.

dacort · on March 16, 2011

Not really a big deal to just output to a new collection and query that.

rch · on March 16, 2011

Sure - I'm just pointing out something that could be a big deal (streaming results through an ordered structure in memory, that may or may not be backed by a lazy disk store) is actually not a big deal, due to a current implementation constraint.

But you're right too, of course.

dacort · on March 16, 2011

Yea, makes sense. That limit almost seems to fall along the lines of "Nobody should need more than 16MB for inline map/reduce results." ;)

rch · on March 16, 2011

cheers to that.

mike_esspe · on March 17, 2011

Warning for those, who want to use MongoDB on FreeBSD: there is a known, unfixed bug, which locks database a lot:

http://groups.google.com/group/mongodb-user/browse_thread/th...

http://jira.mongodb.org/browse/SERVER-663

emef · on March 16, 2011

I was really hoping for full-text search support, but I understand the team is busy :/

rabidsnail · on March 16, 2011

All you need for full-text search is to add a multikey of tokens to the documents you want indexed. Tokenizers and stemmers are actually really easy to write, and there are libraries you can use to do that for you.

phsr · on March 16, 2011

As a newb to non-relational databases, but planning on learning one soon, what is the advantage of MongoDB vs Redis? I'm planning to use ruby with either, but was interested if there was a reason to pick one over the other.

A1kmm · on March 16, 2011

I don't think you should just learn one; they are all used for different niches. It depends how your app trades off different things (consistency, reliability of reads, reliability of writes, speed of reads, speed of writes, efficiency of hardware use, and so on).

See http://news.ycombinator.com/item?id=2052852 for a comparison.

phsr · on March 16, 2011

I'm not planning on just learning one, just trying to pick one to learn first

tmountain · on March 16, 2011

They're really two different beasts. Mongo stores things in a manner similar to a relational DB (minus the relations). Think of it as a store for JSON that allows indexing and SQL-style queries using a JSON style syntax. Redis data structures are closer to what you'd find in computer science books (lists, sets, hashes, etc.). You should explore both and see what meets your needs.

JulianMorrison · on March 17, 2011

Redis is for flat structures. MongoDB is for nested.

In both cases, (unlike CouchDB) you can alter data structures by more complex means than simply replacing the whole thing (such as incrementing a counter). In both (again unlike CouchDB) the updates overwrite in place and do not waste space (but also do not preserve past versions or allow readers to overlap writers).

Redis is for stuff that fits in memory. MongoDB scales up to "big data", provided the individual items are moderately sized.

Redis runs in RAM so it's blazingly fast. MongoDB is about as fast as MySQL.

Redis is single threaded so only one operation runs at once (the speed makes this mostly not a problem). Some operations globally block MongoDB, some can run in parallel.

In both, operations are atomic. Redis has transactions of a sort that group operations and ensure the data they relate to is unchanged. MongoDB operations can't be grouped into a transaction, but they can be a lot more complex so they effectively become a transaction (limited to operating on one data item).

effkay · on March 16, 2011

That totally depends on what you want to do with Redis/MongoDB. Do you intend to use it as your primary data store? AFAIK One of MongoDB's goals is to be useful for a lot of things you'd normally use a relational database for. Redis on the other hand has a lot of nice things to handle more specialized cases of data.

kchodorow · on March 17, 2011

(I work on MongoDB, but trying to give a balanced opinion.)

Redis is a great key-value store, MongoDB is more of a fully-featured database. Redis has some nice set operations and is pretty easy to learn (all of the commands are here: http://redis.io/commands). MongoDB is also pretty easy to learn (click the "Try it out" button at http://mongodb.org/), but there are a lot of advanced features to learn about.

So, if you need a key-value store, Redis is a great choice. If you want to do something more complex, MongoDB would probably work better.

eldenbishop · on March 17, 2011

Wow. I knew about the durability changes but I had no idea that sparse and covered indexes where coming. These three changes where the biggest drawbacks to mongo for me.

on March 16, 2011

[deleted]

baltcode · on March 16, 2011

Even numbers after the period are good, odd are bad. Large numbers after second period are bad.

suhail · on March 16, 2011

yay "Tab completion in the shell" and "B-tree index self-compaction" are great. So is --journal.

rb2k_ · on March 16, 2011

Do you happen to know details about the B-tree index self-compaction? I can't seem to find it in their changelog on jira ( http://jira.mongodb.org/browse/SERVER?report=com.atlassian.j... )

dm_mongodb · on March 16, 2011

consolidation of adjacent btree nodes after key deletions when appropriate. arguably should have already done this, but it's there now!

vegai · on March 17, 2011

The clustrix guy criticized mongodb that it locks the whole database quite often. Can anyone confirm or deny that?

vegai · on March 17, 2011

Modded to 0 and never answered. I'll take that as a confirmation.

gaius · on March 16, 2011

But is it web scale?

mrinterweb · on March 17, 2011

I got in so much trouble for a very similar web scale comment. Never shall trite jokes be used on hacker news. You'll get your head bit off. Personally, it gives me a giggle, and I guess that means I am much less mature than most consumers of Hacker News. http://news.ycombinator.com/item?id=2104276

kchodorow · on March 17, 2011

I think it's just an old joke at this point. It's funny the first time you see it and the second time, but the 50th?

mrinterweb · on March 17, 2011

I did admit the joke is trite.

gaius · on March 17, 2011

The thing you have to understand about the NoSQL crowd is there's no actual thinking here, just religious fervour.

gaius · on March 16, 2011

I guess not!