Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
List of graph databases (graph-database.org)
88 points by helwr on June 25, 2011 | hide | past | favorite | 28 comments


Had to go look this one up just to find out the pros/cons of graph databases vs. SQL and other NoSQL databases.

http://en.wikipedia.org/wiki/Graph_database

It'd be nice if the information presented here would eventually merge with the wikipedia entry.


Graphs are a much more elegant way of storing relational data. With graph databases you don't have to mess with tables or joins -- everything is implicitly joined.

And Neo4j rocks -- store 32 billion nodes (http://blog.neo4j.org/2011/03/neo4j-13-abisko-lampa-m04-size...) with 2 million traversals per second (http://www.infoq.com/news/2010/02/neo4j-10), and you can use Gremlin with it (the graph traversal language), which let's you calculate PageRank in 2 lines.

Neo4j is open source, and the Community Edition is now free (https://github.com/neo4j/community). I recommend pairing it with the TinkerPop stack (http://www.tinkerpop.com/).

For tabular data, relational databases rock. But the relational model doesn't align well with object-orientated programming so you have an ORM layer that adds complexity to your code. And with relational databases, the complexity of your schema grows with the complexity of the data.

The graph-database model simplifies much of this and makes working with the modern-day social graph so much cleaner. Graphs allow you to do powerful things like find inferences inside the data in ways that would be hard to do with relational databases.

Check out Peter Neubauer's introduction to graph databases and how they compare to RDBMS' and where they stand in the NOSQL-movement (http://www.infoq.com/articles/graph-nosql-neo4j).

Marko's website is a great resource too (http://markorodriguez.com/) -- he created Gremlin, the graph query language.


What I always find hard to understand is that when you "query" a graph database, you have to start at a certain node.

User:

  Name: John  

  Age: 43

  Country: Sweden

  Job: SoftwareEngineer

What if I wanted to get the names of all software engineers in sweden? Do I have to create a type of "job" node and start traversing from the one that is for software engineer's?

Also: Is sorting possible at all with graph databases? What if I only want people called John that are older than 20?

Is this too far out of scope for Graph Databases?


"Is this too far out of scope for Graph Databases?"

No, it's just that ad-hoc boolean queries are better suited for other query strategies.

That's not to say that you couldn't use a graph database alongside another strategy. Neo4j, for example, has built in integration with Lucine. These sorts of boolean logic queries are best addressed by your typical b-tree index.

Note that, in general, you want to prefer relationships over attributes. Depending on the application, you'd probably have something more like this:

  Node types:
    User(name, age)
    Country(name)
    Job(title)

  Relationship types:
    LivesIn()
    HasJob()
To accomplish the boolean AND/OR queries, you'd store the key/value pairs from the User object from your comment in Lucine. But in the general cases of your app, you'd encode the relationships you need for particular, non-ad-hoc queries. Nodes and relationships can be used like pointers to create data structures in your graph. From there, standard algorithms generally apply.

"Is sorting possible at all with graph databases"

Often, you can completely bypass it by encoding order as a linked list. Add a relationship type: NextOldestPerson()


Graph databases usually have an accompanying index. For example, Neo4j is tightly integrated with Lucene.

You can use the index to get the start node, or you can reference the start node directly by using the node ID.

If you have modeled the graph so that each Person node has a link (or "edge" in graph speak) labeled "location" that points to their country, then the Gremlin query to return all software engineers in Sweden would look like this:

This is a Python example from the soon-to-be-released Bulbs persistence framework, which can connect to Neo4j, OrientDB, Dex, etc:

  sweden = Country.index.get(name="Sweden")
  script = "v.in('location'){it.occupation == 'Software Engineer'}"
  software_engineers = sweden.gremlin(Person,script=script)
Here's what the code says:

1. Use the Country index to get the node with the name "Sweden".

2. Return a Gremlin query with results of type Person (software engineers) using the specified script.

The script says, staring at vertex "v" (which is a reference to "self", which is in this case is sweden), return all the incoming vertices connected by an edge labeled "location" and filter the iterator by occupation ("it" is Gremlin for iterator), where occupation equals "Software Engineer".

For more on graph data modeling, see "Knowledge Representation and Reasoning with Graph Databases" (http://markorodriguez.com/2011/02/23/knowledge-representatio...).

Sorting is easy to do using Gremlin. Here's an example:

  g.v(1).out.sort{it.creation_date}.reverse().toList()
This says in graph "g", start at vertex 1, return all of the outgoing vertices from vertex 1 and sort the results by creation_date, in reverse order, and return the results as a list. Gremlin is a domain specific language written in Groovy, and "it" means "iterator" in Groovy.

Check out this post for more details on sorting (https://groups.google.com/forum/#!searchin/gremlin-users/sor...).


KayaDB is a graph database that can do that: It's a label directed graph that mimics the relational model, so you can do the things you mentioned but much more. For example, you can query just for John and it returns exact matches in any table/column in O(1).


Thanks for the information, but I'd rather not store my data somewhere that I can't get to or do local testing and development.


Here is an interview with Marko and Peter that provides a good overview on "Applying Graph Analysis and Manipulation to Data Stores" (http://www.odbms.org/blog/2011/06/applying-graph-analysis-an...).


How solid is TinkerOpo these days? I can definitely tell that it'd take care of a lot of work that I'd have to do manually, but I'm also somewhat hesitant to convert to it in case it's buggy and the team responds slowly to bugs.


TinkerPop has a refreshingly solid stack and community around it -- Marko is one of the founders and one of the leading graph gurus around. One of the other founding members is Peter Neubauer, the guy who founded Neo4j.

Someone made this comment the other day comparing the TinkerPop group to the how it was in the early days of JServ/Tomcat (https://groups.google.com/d/msg/gremlin-users/pF577035UpY/M7...).


* Editing error: the first line should say "everything is explicitly joined"


A question I haven't been able to answer from all this: is there, as of yet, a portable graph database library, ala SQLite, that can be used as a file format for graph-based data?


TinkerGraph is close (https://github.com/tinkerpop/blueprints/wiki/TinkerGraph) -- it used to be an in-memory only graph, but you can now persist it. Also look at GraphML (http://graphml.graphdrawing.org) and the full TinkerPop stack (http://www.tinkerpop.com).

Gremlin (https://github.com/tinkerpop/gremlin/wiki) is a graph query lanaguage and one of the main discussions groups for general graph database stuff is the Gremlin Users group (https://groups.google.com/forum/#!forum/gremlin-users) -- ask your question about the portable library in there -- the group was created by Marko, the guy who wrote TinkerGraph.

Here's a 10 min screencast on Gremlin: http://www.youtube.com/watch?v=5wpTtEBK4-E

Rexster (https://github.com/tinkerpop/rexster/wiki/) is a REST server that has Gremlin built in, and it interfaces with most of the graph databases, Neo4j, DEX, OrientDB, etc.

There is an open-source Python persistence framework in the works called Bulbs that connects to Neo4j through the Rexster REST server, and there are binary bindings in the works as well. There is also a Python open-source Web development framework for graph databases called Bulbflow that is based on Bulbs and Flask. Both frameworks should be released in the next few weeks.


There's RedStore, "a lightweight RDF triplestore written in C using the Redland library".

http://www.aelius.com/njh/redstore/


You aré looping for neo4j. It is beautiful.


I'm not quite sure; I was imagining something with a small C API like SQLite's (which can usually be incorporated into anything), whereas Neo4j requires the JVM. This means, for example, that if I want to write a Ruby app that interacts with the graph database, it has to become a JRuby app. I suppose a service-based architecture could avoid this (where only the process interacting with the graph has to be run on the JVM), but that defeats the whole purpose of using a database library rather than a client-server database. I suppose it works well enough for languages that are already founded on the JVM, such as Clojure—but that doesn't make it universally portable to systems that don't even have a JVM implementation (such as iOS), in the way that SQLite is.


If you want to write a Ruby App you just use Neo4j with their Rest API.

You are right in all the other points though.


I've used neo4j for a project a while ago. Once you got past the documentation, it was cool to work on and see data transactions happening without having to write code for processing RDBMS data. I haven't looked at it or used it in a while so I can't speak to their current state, but when I used it, they had a bit of a learning curve for getting up to speed. Particularly, some of their code examples were a little off so I had to spend a couple of days researching to figure out why the project compiled but nothing ended up in the database.


Neo4j has recently added support for two new query languages -- Gremlin and Cypher. Gremlin is a domain specific scripting/traversal language written in Groovy, and you can include a Gremlin script in whatever language you're writing code in just like you can include an SQL script in your code. Cypher is brand new and experimental.


I do not think one exists yet. Most of these are built on non-portable stacks and would require you to either build an app on that stack or run a client-server setup.


I've used Sones for this, but it does require .Net so it depends on how portable you need it to be. Within the MS ecosystem it works fine.


Actually, this is a list of graph database management systems.

I was hoping for a list of graph databases, like Knuth's The Stanford Graphbase, which is something completely different.


I am surprised by the large number of databases in the list that's written in Java. I would have expected that something performance critical, like a DB would be written in C. Is it typical these days to a database in Java? Are there any well known DBs written in Java?


Surprised Dydra isn't mentioned: http://dydra.com


I am not sure how active they are. I asked for a beta invitation a long time ago, no response.


A bit odd that it doesn't mention ZODB. It's a pretty popular/successful graph/object database.



They forgot to list CODASYL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: