BuntDB – Fast, embeddable, in-memory key/value database for Go with geospatial

jimktrains2 · on July 20, 2016

It's not really geospatial; you just have multidimentional indices and an intersect operation. I'm not dissing you, indices like that can be extremely useful and tricky to implement!

That said, why only 4D? 5D is very useful, but no one seems to support it :( (x,y,z,time,value)

tidwall · on July 21, 2016

FYI, I just added support for up to 20 dimensions.

tidwall · on July 20, 2016

Thank you for the feedback. I like the idea of lifting the cap for number of dimensions. I hardcoded a limit of 4 to handle the standard XYZM, but the implementation could technically handle around 20.

mappu · on July 20, 2016

Single-file and 1.2Kloc, clearly inspired by Bolt but adds indexes, which is great.

I actually use Bolt in a project at the moment - what's the performance story for general-purpose database use?

tidwall · on July 20, 2016

Thanks for the kind words.

I really like Bolt. It's a wonderful library with an good API. I was inspired by the simplicity of it's transaction model.

Both Bolt and Bunt are ACID, and both persist data to disk. The biggest difference between them is that Bolt reads and writes from disk, while Bunt reads and writes from memory (and has an append-only file for durability).

Therefore the amount of data that Bolt can handle is limited by the size the disk, while Bunt is limited by the amount of RAM.

A general purpose database user will likely see a bump in performance by moving from Bolt to Bunt. What I'm seeing for my projects about 2x on reads and about a 40x on writes. I wrote a Raft store implementation that is a drop-in replacement for the the Bolt version. Here's a comparison benchmark: https://github.com/tidwall/raft-boltdb#benchmarks

It really comes down to what you need. Lots of data, or lots of speed.

rakoo · on July 20, 2016

> Both Bolt and Bunt are ACID, and both persist data to disk. The biggest difference between them is that Bolt reads and writes from disk, while Bunt reads and writes from memory (and has an append-only file for durability).

Just to be sure: does this mean that Bunt has a window of time where data is purely in ram only, and it is eventually persisted ? Because the description made me think that BuntDB was purely in-memor. Is there some upper limit on how much time an object may be in memory but not persisted yet ?

On another note, congrats for this project. I see that you changed the default "Set" to use strings instead of bytes, this was a bit of a pain point when I used BoltDB. Indexes should also be interesting.

tidwall · on July 20, 2016

Bunt is a purely in-memory database, but it also persists to disk so that the database can be reopened. It's a lot like Redis in this manner.

Basically, BuntDB requires that data be persisted prior to completing a transaction. There is no window of time where there is data in memory and not on disk. It's designed so that there is no way for data to exist in memory and not be on disk.

I decided that strings were a better way to go because 1) the string is the most common type in a key/value database, 2) strings take up less memory than a byte slice, and 3) strings are just bytes anyhow so they can always be converted using []byte(str).

Thanks for the kind words and I hope you give it a try.

rakoo · on July 21, 2016

So, what about this paragraph at the end: https://github.com/tidwall/buntdb#durability-and-fsync

In the default configuration there is a 1 second window where data is not fsynced to disk ?

herge · on July 20, 2016

I can understand why they only allow one read/write transaction at a time.

However, could they implement multiple concurrent read/write transactions by having the transaction fail if it writes to any key modified by any other concurrent transaction?

Like if writer X modifies a key at time t1, but writer Y opens a transaction at time t0 and tries to modify the same key at time t2, Y is told their transaction is invalid and should restart their operation from the beginning.

jhugg · on July 20, 2016

Sometimes this is slower than serialization. In fact, when you’re doing KV-CRUD work on in-memory data, it’s often slower than serialization. Keeping RW-sets is non-trivial overhead compared to the hardest typical part of KV-CRUD, tree or hash lookups.

Many many systems have more parallelism, but less throughput.

Now, if you want to prevent one transaction with a bad-actor blocking the system, then RW-sets, timeouts and OCC/MVCC might be a good idea, it just won’t be faster.

liotier · on July 20, 2016

Is it common for this sort of database to not expose an interface over IP ? It seems to me that a local-only database would severely restrict the use-cases - but maybe I'm just ignorant of many local-only uses. Or should another program handle the networking, with BuntDB as a backend ?

danielheath · on July 20, 2016

It's common for programs to embed a database engine. Informally, you're doing this every time you write any structured data to a file.

Baking a database into your application drastically simplifies distribution/deployment and avoids network bottlenecks (at the cost of restricting your choice of storage engine, making it harder to hire staff experienced with your tech, etc).

tidwall · on July 20, 2016

There are plans for a frontend application with a simple command interface, and network support.

Though for now it's a Go package that is intended to be imported into projects.