I've had the chance to hear Richard Hipp talk about SQLite yesterday! He mention...

_pgmf · on Nov 8, 2017

I'm jealous you got to hear Dr. Hipp, that sounds cool. Would love to hear more about the circumstances :)

Regarding the LSM engine, you can find all the relevant implementation details here: https://sqlite.org/src4/doc/trunk/www/lsm.wiki#summary

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system. Under normal circumstances, the in-memory tree is not allowed to grow very large.

8ig8 · on Nov 8, 2017

I stumbled on this video presentation by Dr Hipp recently (may have been from another HN comment). I really enjoyed it. Probably because he seems so enthusiastic and passionate.

SQLite: The Database at the Edge of the Network with Dr. Richard Hipp

https://m.youtube.com/watch?v=Jib2AmRb_rk

makmanalp · on Nov 8, 2017

I got lucky! It was a class event, and he's known to speak at databases / data systems classes. He's a very fun (and opinionated!) speaker.

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system.

Hmm, ok, so this contradicts my assumption. Actually, now that I think about it, other LSMs like rocksdb / leveldb work like this too (library-like model with some in-memory component when you "open" the database).

Anyway, without diving into the details of the code, there's other technical decisions that would affect this stuff.

One thing is how big your in-memory structure is (in relation to available memory and insertion workload) and how often you flush to disk is a key thing. Another thing is what your LSM tree looks like - aside from the data structure, how many tiers/levels you have is a big thing. I assume some of these are configurable parameters. E.g. rocksdb has an enormous set of parameters that handles this stuff. It's also annoying to tune.

I found this benchmark here that is illustrative: https://sqlite.org/src4/doc/trunk/www/lsmperf.wiki

The first graph is underwhelming, but when you adjust the buffers (look at the last graph) ~250k writes / second constant regardless of database size (this is why you want an LSM tree) is darned good! And this is on a spinny drive, not an SSD. Their "large" buffer sizes aren't even that large IMHO.

So maybe his mention that the LSM storage was underwhelming was overblown :-) I don't know.

Another difference is with other LSM-based systems that aren't just key-value, it's usually in the context of column stores: you keep a separate LSM for each column family (could be 1-n columns). But I can't think off the top of my head how this would cause a difference. Perhaps in how reads happen - the query engines work quite differently.

Anyway, my talk is cheap, I'm just guessing here, actually doing the analysis is the hard work :-) Also, I'm something of an amateur currently, so take my words with a grain of salt. Anyone else have any ideas re: this?

rurban · on Nov 8, 2017

This is the other perf graph: https://sqlite.org/src4/wiki?name=LsmPerformance

rattray · on Nov 8, 2017

What would be performance for SQLite3 in comparable scenarios? I don't see anything comparative on that benchmarks page.

makmanalp · on Nov 8, 2017

I've gotten it to do around 40-50k inserts / sec but that's a different scenario - nfs drive, different table and indexes, different queries, different configuration, etc etc. Also I didn't know if he meant that the inserts were disappointing or the overall results were (e.g. a suite of tests including reads / writes of all sorts).

X86BSD · on Nov 8, 2017

I just want to hijack this thread to say hipps other creation Fossil SCM is a great SCM. Better than git imo. Everyone should check it out.

petre · on Nov 8, 2017

We've been using it for the last 3/4 years. It's great and really user friendly, with integrated help and a web interface - everything in a single binary. The trouble with it is that it gets slower once you have a lot of history and many files. I don't know if you can use it for huge things like the Linux kernel or the FreeBSD ports tree. I once tried to import the ports tree into fossil and gave up after 2G and an hour. It will import anything that can do git fast-export. Now it also imports svn dumps as well. Fossil is a very good replacement for SVN. You can set up a central repo where everyone syncs on commit and update.

RainaRelanah · on Nov 8, 2017

To be fair, Fossil's intended use case is the exact opposite of the Linux kernel. See 3.3 and 3.4 of the fossil vs git page[0].

[0] https://www.fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.w...

qznc · on Nov 8, 2017

I find most of the differences listed there contrived.

One big difference is that fossil includes wiki and ticketing.

Philosophy differs: Fossil intentionally limits "distributedness". For example, fossil push/pull always transfers all branches with their name. Private branches are considered an anti-feature.

Minor differences are the licence (GPL vs BSD) and the data store (files vs sqlite). Under some circumstances these details matter, but not for the majority of developers.

The rest is not significant, imho. For example, "Lots of little tools vs stand-alone executable". Who cares? In both cases you type "$VCS $SUBCOMMAND $PARAMETERS" commands.

RainaRelanah · on Nov 8, 2017

You're right, philosophy differs. I generally dislike private branches. It goes back to the origin of git -- intended for the linux kernel, one of the most widely used open source projects with tens of thousands of contributors. Linus doesn't want or need to see a million private branches. None of the projects I work on are of that scale. When your team is under a dozen people, being able to see what your coworkers are playing with in real time (autosync) is actually incredibly useful.

Stand-alone executable is pretty significant. Git is available on most servers -- fossil is not. If it's packaged in your OS, it's often outdated. Stand-alone kinda makes up for this as you can easily get the latest version with a wget & chmod on any computer, on all 3 platforms.

As for sqlite, it is an astoundingly solid rdbms that is well battle-tested. I consider that a big difference.

interfixus · on Nov 8, 2017

Why does everybody persist in calling the great man Dr.?

He refers to himself as D. Richard Hipp.

ptrott2017 · on Nov 8, 2017

D is an initial but Dr. D. Richard Hipp has a PhD from Duke - graduated in 1992 (his thesis can be found here - its worth a read: http://bit.ly/2ygiDWx ) hence the Dr.

interfixus · on Nov 8, 2017

Thank you. Always thought it was a weird comprehension error from the slightly unusual character-combo.

The weird error may have been mine.

makmanalp · on Nov 8, 2017

Ooops!

thesz · on Nov 14, 2017

I did something like that - LSM backend for OO-relational DBMS (inhouse).

I also hoped for big win on the insertion-heavy loads, and I also haven't succeed in that. The problem is that every insert statement must read back something from DB to verify DB state against schema for correctness. As reads in LSM are slower, the net win is either absent or negligible. I have to say I wrote "must" in sentence above because you sometimes can get away without reading back, but not always. In the end, worst case scenario is always "read and write", not just "write".

But!

I devised a scheme to lay out layers' data so that they are as contiguous as they can be. Or get a very good approximation to that contiguousness, basically (O(1) "pages" per level). Thus contiguous reads got very high performance and beat old storage on read scheme, despite the need of level merging, etc.

katastic · on Nov 8, 2017

Perhaps I'm misunderstanding you, but sqlite supports ramdisks/"In-memory databases"

https://www.sqlite.org/inmemorydb.html

Perhaps the key (ha!, pun) is that you're talking about using RAM _and_ disk with the RAM being for caching/fast access that eventually hits the disk. Whereas, I think, in this case sqlite is either on the disk, or in RAM. There is no multiple tiers.

Correct me if I'm off here. Thanks.

makmanalp · on Nov 8, 2017

> Perhaps the key (ha!, pun) is that you're talking about using RAM _and_ disk with the RAM being for caching/fast access that eventually hits the disk.

This is closer to what I mean but not quite. Specifically, in-memory (or main-memory) is a technical term that talks about a specific type of database that focuses on doing operations /mostly/ in memory, with some spillover to disk as necessary, but only as an edge case. You usually then handle persistence with a sequential transaction log, perhaps in NVDIMM storage if you have some. This is in contrast to other systems where the memory is a buffer, but the "actual stuff" happens on disk. There is of course many hybrid schemes - it's really more of a spectrum. Some examples: SAP HANA, ArangoDB, MapD, MemSQL, MS SQL Server Hekaton.

There's also a lot of techniques that come with this style of thinking - often these are column stores, and often since these databases don't really hit disk, the bottleneck is moving data in and out of memory to the CPU caches, and sometimes just CPU speed. Often these use lightweight compression and SIMD instructions to tackle those problems.

So SQLite's "in-memory" database scheme doesn't quite count as something like this, it's more of a disposable database. But that's ok - it's not bad, it's just a different thing.

> Whereas, I think, in this case sqlite is either on the disk, or in RAM. There is no multiple tiers.

See the thread above - I kind of assumed this was the case, but like you mentioned, it looks like with the LSM implementation it does kind of sort of have a memory component - it lasts until you close the database "connection", and gets flushed often.

infogulch · on Nov 8, 2017

Actually in WAL mode with `PRAGMA synchronous = NORMAL`, it only flushes to disk periodically, not after every transaction.