Graphs are a much more elegant way of storing relational data. With graph databases you don't have to mess with tables or joins -- everything is implicitly joined.
For tabular data, relational databases rock. But the relational model doesn't align well with object-orientated programming so you have an ORM layer that adds complexity to your code. And with relational databases, the complexity of your schema grows with the complexity of the data.
The graph-database model simplifies much of this and makes working with the modern-day social graph so much cleaner.
Graphs allow you to do powerful things like find inferences inside the data in ways that would be hard to do with relational databases.
What I always find hard to understand is that when you "query" a graph database, you have to start at a certain node.
User:
Name: John
Age: 43
Country: Sweden
Job: SoftwareEngineer
What if I wanted to get the names of all software engineers in sweden?
Do I have to create a type of "job" node and start traversing from the one that is for software engineer's?
Also: Is sorting possible at all with graph databases? What if I only want people called John that are older than 20?
"Is this too far out of scope for Graph Databases?"
No, it's just that ad-hoc boolean queries are better suited for other query strategies.
That's not to say that you couldn't use a graph database alongside another strategy. Neo4j, for example, has built in integration with Lucine. These sorts of boolean logic queries are best addressed by your typical b-tree index.
Note that, in general, you want to prefer relationships over attributes. Depending on the application, you'd probably have something more like this:
To accomplish the boolean AND/OR queries, you'd store the key/value pairs from the User object from your comment in Lucine. But in the general cases of your app, you'd encode the relationships you need for particular, non-ad-hoc queries. Nodes and relationships can be used like pointers to create data structures in your graph. From there, standard algorithms generally apply.
"Is sorting possible at all with graph databases"
Often, you can completely bypass it by encoding order as a linked list. Add a relationship type: NextOldestPerson()
Graph databases usually have an accompanying index. For example, Neo4j is tightly integrated with Lucene.
You can use the index to get the start node, or you can reference the start node directly by using the node ID.
If you have modeled the graph so that each Person node has a link (or "edge" in graph speak) labeled "location" that points to their country, then the Gremlin query to return all software engineers in Sweden would look like this:
This is a Python example from the soon-to-be-released Bulbs persistence framework, which can connect to Neo4j, OrientDB, Dex, etc:
1. Use the Country index to get the node with the name "Sweden".
2. Return a Gremlin query with results of type Person (software engineers) using the specified script.
The script says, staring at vertex "v" (which is a reference to "self", which is in this case is sweden), return all the incoming vertices connected by an edge labeled "location" and filter the iterator by occupation ("it" is Gremlin for iterator), where occupation equals "Software Engineer".
This says in graph "g", start at vertex 1, return all of the outgoing vertices from vertex 1 and sort the results by creation_date, in reverse order, and return the results as a list. Gremlin is a domain specific language written in Groovy, and "it" means "iterator" in Groovy.
KayaDB is a graph database that can do that: It's a label directed graph that mimics the relational model, so you can do the things you mentioned but much more. For example, you can query just for John and it returns exact matches in any table/column in O(1).
How solid is TinkerOpo these days? I can definitely tell that it'd take care of a lot of work that I'd have to do manually, but I'm also somewhat hesitant to convert to it in case it's buggy and the team responds slowly to bugs.
TinkerPop has a refreshingly solid stack and community around it -- Marko is one of the founders and one of the leading graph gurus around. One of the other founding members is Peter Neubauer, the guy who founded Neo4j.
A question I haven't been able to answer from all this: is there, as of yet, a portable graph database library, ala SQLite, that can be used as a file format for graph-based data?
Rexster (https://github.com/tinkerpop/rexster/wiki/) is a REST server that has Gremlin built in, and it interfaces with most of the graph databases, Neo4j, DEX, OrientDB, etc.
There is an open-source Python persistence framework in the works called Bulbs that connects to Neo4j through the Rexster REST server, and there are binary bindings in the works as well. There is also a Python open-source Web development framework for graph databases called Bulbflow that is based on Bulbs and Flask. Both frameworks should be released in the next few weeks.
I'm not quite sure; I was imagining something with a small C API like SQLite's (which can usually be incorporated into anything), whereas Neo4j requires the JVM. This means, for example, that if I want to write a Ruby app that interacts with the graph database, it has to become a JRuby app. I suppose a service-based architecture could avoid this (where only the process interacting with the graph has to be run on the JVM), but that defeats the whole purpose of using a database library rather than a client-server database. I suppose it works well enough for languages that are already founded on the JVM, such as Clojure—but that doesn't make it universally portable to systems that don't even have a JVM implementation (such as iOS), in the way that SQLite is.
I've used neo4j for a project a while ago. Once you got past the documentation, it was cool to work on and see data transactions happening without having to write code for processing RDBMS data. I haven't looked at it or used it in a while so I can't speak to their current state, but when I used it, they had a bit of a learning curve for getting up to speed. Particularly, some of their code examples were a little off so I had to spend a couple of days researching to figure out why the project compiled but nothing ended up in the database.
Neo4j has recently added support for two new query languages -- Gremlin and Cypher. Gremlin is a domain specific scripting/traversal language written in Groovy, and you can include a Gremlin script in whatever language you're writing code in just like you can include an SQL script in your code. Cypher is brand new and experimental.
I do not think one exists yet. Most of these are built on non-portable stacks and would require you to either build an app on that stack or run a client-server setup.
I am surprised by the large number of databases in the list that's written in Java. I would have expected that something performance critical, like a DB would be written in C. Is it typical these days to a database in Java? Are there any well known DBs written in Java?
http://en.wikipedia.org/wiki/Graph_database
It'd be nice if the information presented here would eventually merge with the wikipedia entry.