Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So on eway to think of a hash table/map is a set of tuples of <key, value>. Imagine we extend that to <insertion timestamp, key, value>. All of your conventional mapping methods (get/has/put) work on the key and value and ignore the timestamp. But anything that relies on iteration takes advantage of the timestamp[1], and iterates in timestamp order (or, in fact how this is normally done, is that you have a sparse array/map of <key, pointer> and a dense array of <timestamp, value>. Whenever you insert a new key/value, you always append value to the end of the dense array (so that array is dense), and the key is hashed as normal, but the value in the hash table is a pointer into the dense array where the timestamp/value pair is stored. So internally, get is

    return (map[hash(key) % map_size])->value
or approximately that (I haven't written C in a while).

Then iteration over objects in the array is consistent: you just iterate over the dense array. Removal from the dense array is done by tombstoning, and possibly eventual compaction. This is essentially how python's current hash table implementation works.

IIRC, if you're table can assume only insertions, this actually becomes really, really nice as a persistent data structure, since you can replace the backing vector with a backing linked list, and you only really lose out on iteration speed. Then you further extend that by swapping the linked list to a tree, and you share the entire backing structure.

[1]: And you can use an increment-only mutation counter instead of a timestamp to make this pure.



Correct me, if I am wrong: take everything you described and remove the (logical) timestamp, and all you'd be losing out on would be the iteration in insertion order?

So how does the timestamp have any impact on persistence?

(We agree on basically everything you write.)

> [1]: And you can use an increment-only mutation counter instead of a timestamp to make this pure.

Agreed. That's what I'd call a logical timestamp or a logical clock.

> IIRC, if you're table can assume only insertions, this actually becomes really, really nice as a persistent data structure, since you can replace the backing vector with a backing linked list, and you only really lose out on iteration speed. Then you further extend that by swapping the linked list to a tree, and you share the entire backing structure.

That still doesn't tell you anything at all about how you make the hashtable itself persistent.

One simple way would be to just introduce an arbitrary order on your hashes, eg compare them as if they were ints, and then stick them in an ordered map container. Of course, that's just a round-about way to impose an arbitrary order on your keys. Works perfectly well, it's just not too interesting.

To come back to your original question:

> What's the value of a sorted map over an ordered map? > I don't think I've ever cared to iterate over map values based on the alphanumeric ordering of their keys.

In the cases we talked about making the keys comparable is only a means to an end, and any arbitrary order will do. That's the most common use case. Iterating over the keys in insertion order is also often useful.

In practice, I did come across some use cases that make genuine use of the order of keys. Eg when I was interested in (automatically) keeping track of the highest or lowest key or quantiles.

You can use a sorted map as a priority queue easily. A min heap would give you O(1) access to the minimum item, but if you actually want to pop it, it's O(log n) anyway. A sorted map as tree gives you O(log n) for basically all operations. For most uses that's either good enough, or doesn't even make a difference at all compared to a heap. Using your sorted map as a priority queue gives you arbitrary priority updates in O(log n) for no extra implementation complexity.

Sorted maps also came in handy when I was implementing geometric algorithms where I wanted to sweep a scanline over points. Or for divide and conquer algorithms over space.

A sorted map is also useful as a basic datastructure to build step functions on top of. (https://en.wikipedia.org/wiki/Step_function) A simple application is solving the infamous skyline problem.

When you want to store your data structure, having similar keys close together can help with compression. Google uses sorted string tables (sstable) show that principle quite well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: