"The other feature you quickly find yourself needing is the ability to use arbitrary objects as keys in a hash map. In languages that don’t support that (Perl, Python, Go, JavaScript until recently)"
I don't think that's right. Python and Go do effectively support that. Each has their own take on the problem of being unable to use mutable keys, but I index by some sort of object all the time. Makes iteration over hashes substantially more useful when you get objects for both hashes and keys.
Back in the Python 2.0 era, before it grew all the other features that were advantages over Perl, before Python even had generators/iterators, this was one of the big reasons I preferred Python over Perl. Only being able to index by strings wasn't necessarily a huge stopper in theory, but in practice it was very dangerous, because you really need to encode your keys. It was extremely common to see
to be safe, where pipeEncode can be as simple as a routine to backslash encode pipe and backslash characters (although no simpler than that; you must do both of them to be safe, that's the minimum). This could result in non-trivial bugs and even security issues when you start encoding things like $object . "|" $permission and hackers start sticking pipes in the names of objects.
But between the inconvenience of that if you know what you're doing, and the pile of code you'll encounter from developers who don't know why that's important (and may even argue in code review), it was definitely the sort of thing that grinds you down over time.
In Python it was safe to
print(hash[(keyPart1, keyPart2)])
and it was consistent and safe. Or use an object, etc.
It looks like your "pipeEncode" is generating a string that reflects your desired key equality function. I mention this approach in the blog post, but I consider it a poor substitute for being able to use arbitrary objects and specify the equality function on the map or set.
The issue of mutable keys is a slightly different one. If you mutate any of the properties of your object that are used by the map you are going to have a bad time, so don't do that. And I guess if your maps are sufficiently simple (eg JS's object-identity maps) then the user can't make that mistake, but at what cost?
If they are generating strings as keys and they mutate the object after creating the string then this will also break so they haven't even really avoided the problem.
"I consider it a poor substitute for being able to use arbitrary objects and specify the equality function on the map or set"
My point is that it's an even worse substitute than most programmers realize, because to use it properly you have to understand how to encode parameters. The thing that people usually use, string concatenation with some delimiter, is fundamentally flawed.
(My favorite... and, alas, I've inherited a system that uses this, though fortunately it hasn't surprised me yet... is using "underscore" as a delimiter, for values whose names routinely include underscores! Fortunately, nothing ever tries to extract the original values from the key programmatically, and it's not really in a place hackers are going to attack. But still... yeesh.)
The one exception I've sometimes made is that if you happen to be in an environment where you know a certain value will never be used, you can use that as the delimiter; I've used ASCII NUL for that a few times. But you have to be sure that it's not just a "weird" value that "nobody would ever use", but something truly excluded by the context, something that regardless of what is input by someone somewhere is completely impossible to ever get to your code. Generally, the characters you can type on a keyboard are not a good idea.
Go does not support using some types as map keys, including slices, channels, functions, and other maps. Slices are the most annoying, and I have seen plenty of code that uses fmt.Sprint(s) as a workaround. Fortunately the compiler now recognizes when you convert a []byte to a string for use as a map key, and will not allocate a new string.
In Go it depends if the type is "comparable"; i.e. if "==" works.
You can use arrays in Go; e.g. map[[2]int]string, and you can also use channels; although I'm not sure what the rules for comparing channels are exactly off-hand (I'm struggling to come up with a scenario when this would be useful off-hand actually).
The big problem with slices and maps is that they can be modified. That is, what happens if you modify a slice after you used it as a map key? In slices this is worse than with maps because the backing array can change if you run out of cap space. And also, do you compare by value or identity? And again, what happens if either changes?
I'm not sure if it's possible to come up with a set of rules that wouldn't take people by surprise in at least some cases.
Python either. I correct my post above to switch to indexing by a tuple, because what I originally had is wrong:
>>> d = {}
>>> d[[1,2]] = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
This is why I qualified my statement with "Each has their own take on the problem of being unable to use mutable keys,".
Go can be consistent in a simple way because of its type system, it can see if any part of a key has something in it that can't be hashed: https://play.golang.org/p/rf8IqPb76Em
Python, as befits Python, has default behavior for instances that I believe is "is" equivalency, but you can override that with various double-underscore methods to do whatever.
>Python, as befits Python, has default behavior for instances that I believe is "is" equivalency
Python dicts consider two keys the same if they have the same hash value and are "=="-equal. So __eq__ and __hash__ are the dunder methods to finagle. Python's sets are the same way.
A useful example is with the pathlib library.
from pathlib import Path
p = Path('a.txt')
q = Path('a.txt').absolute()
p is q # False
{p, q} # Only one element
"q" carries some different info than "p", but it refers to the same file location. So not considering them distinct values is a good decision for this package.
p = Path('a.txt')
q = Path('a.txt')
p is q # False
since p and q are different objects and "is" equality checks if the objects are the same (this can be interpreted approximately as "have the same memory address"), so it'll almost never be true. And in the cases where it is ( 2 is 2), you shouldn't rely on it, as most of them are optimizations.
I don't think that's right. Python and Go do effectively support that. Each has their own take on the problem of being unable to use mutable keys, but I index by some sort of object all the time. Makes iteration over hashes substantially more useful when you get objects for both hashes and keys.
Back in the Python 2.0 era, before it grew all the other features that were advantages over Perl, before Python even had generators/iterators, this was one of the big reasons I preferred Python over Perl. Only being able to index by strings wasn't necessarily a huge stopper in theory, but in practice it was very dangerous, because you really need to encode your keys. It was extremely common to see
when what you need is to be safe, where pipeEncode can be as simple as a routine to backslash encode pipe and backslash characters (although no simpler than that; you must do both of them to be safe, that's the minimum). This could result in non-trivial bugs and even security issues when you start encoding things like $object . "|" $permission and hackers start sticking pipes in the names of objects.But between the inconvenience of that if you know what you're doing, and the pile of code you'll encounter from developers who don't know why that's important (and may even argue in code review), it was definitely the sort of thing that grinds you down over time.
In Python it was safe to
and it was consistent and safe. Or use an object, etc.