Hashes of phone numbers is unfortunately not actually useful in almost any circu...

popcalc · on Nov 1, 2022

This is the same reason hashing a SSN is purely security through obscurity. Anyone with a couple GB of space to spare for a text file can easily perform a reverse lookup.

https://gist.github.com/stouset/4322307

lotsofpulp · on Nov 2, 2022

This is why any business that uses SSN as authentication should be liable for any losses that result from fraudulent usage of the SSN, as opposed to the SSN’s owner being liable.

qxmat · on Nov 1, 2022

Slow hash function + salt would solve this.. e.g. you'd be lucky to do more than 10 hashes a minute with bcrypt and 20 salt rounds.

m4jor · on Nov 1, 2022

Yeah but with hashcat supporting cracking with multiple GPUs, even bcrypts can be cracked quickly now. There are also a ton of cloud cracking services like GPUHash.me and entire cracking forums where ppl crowdsource and help out like HashKiller.

kadoban · on Nov 1, 2022

You can try that, but it's really difficult to tune so it's useful. The amount of time the server has to waste computing hashes is too close to the amount of time an attacker has to waste to break at least some of them.

It's just not hard enough to guess a potentially valid phone number. With passwords, hashing only helps because the probability of a valid password is _very_ low, and because you don't need to look up a password, only check if it's the right one for joeblow (so you can salt them individually).

gerdesj · on Nov 1, 2022

"You can trivially reverse them by iterating through every phone number and computing the hash."

Well yes and no. What exactly is your understanding of a phone number 8)

Not everyone is blessed with the NANP. I'm a Brit and we have an eye wateringly complicated nonsense of a numbering plan and our's isn't the worst.

What do you hash? Perhaps the standardised international representation or one of them (no that is not a joke - telephony is weird). For a laugh you could try one of the many colloquialisms. For example a UK number might be 00441395112233 or 441395112233 or +44 (0)1395 112233 - the final part might be displayed as 112 233 or 112-233. Imagine if the database works by operating on all numbers in locally correct colloquial mode and hashes that!

Now let's really get silly: There are hashes that are nasty to compute but easy to check and vv. We'll use whatever is indicated.

Anyway this is all a very well researched problem, there is no need for silly games: passwords.

ajsnigrutin · on Nov 2, 2022

This data is normalized before it's even saved to the database.

You cannot send an sms to "+44 (0)1395 112-233", so they remove the stuff in parenthesis, the dashes, spaces, etc. first, and then store.

kadoban · on Nov 1, 2022

Phone numbers get complicated, yeah, but US numbers are pretty trivial (and so are they in several other places, and even for UK it's just more annoying, not really computationally harder).

So at _best_ the security analysis is: "okay, all US phone numbers and a bunch from other places might as well be in cleartext", which is already broken enough that it's basically useless.

groffee · on Nov 1, 2022

So normalise the data first? Your comment literally makes no sense at all.

addingadimensio · on Nov 1, 2022

Hash and salt

parker_mountain · on Nov 1, 2022

While a secret salt is effective in the short term, it's an un-rotatable value. Which means, if the salt gets leaked, you are screwed (or rebuilding the entire table by brute forcing it, or adding another layer of salting - not great!).

For a company operating at Facebook's scale, with their kind of scrutiny around handling PII, this is unfortunately functionally useless.

For some data types where hashing isn't super effective, and where associative identifying information is attached (such as a user id), a more effective mechanism might be to encrypt the data with a strong random value appended, and decrypt to do the lookup. This would require a correctly provisioned HSM to do properly - the private key secrets should NEVER be exported.

While hashing seems like a good idea, it's actually particularly and deceptively tricky for these kinds of use cases.

lost_tourist · on Nov 2, 2022

I would say it's better than doing nothing though.

parker_mountain · on Nov 4, 2022

It's complicated. Your local security engineer might be wringing their hands about this. Definitely an avoid at any cost kinda situation.

kadoban · on Nov 1, 2022

If you salt, then either you can't lookup a number, or you've only changed the problem to: iterate over all the possible phone numbers, _add the salt_ and hash them. No big difference.

nemothekid · on Nov 1, 2022

The salt doesn't buy you anything, given that Facebook also knows the salt.

ohbtvz · on Nov 1, 2022

There are only about 3 billion valid US phone numbers. How many hashes can your GPU compute per second?

parker_mountain · on Nov 1, 2022

Back of the hand math, and some benchmarking, suggests that a consumer laptop GPU from about 2015 could bang it out in a month. And, that's being (extremely) pessimistic.

(Assuming a GPU takes .001s to do a sha3 hash, which is more than double the actual benchmarks).

I would estimate that a single, high end GPU from the last or current generation could probably chew through it in under a week.

DenisM · on Nov 2, 2022

One can easily devise a hash function that a GPU can only compute once per second. Or per year, even, although that would be impractical.

m4jor · on Nov 1, 2022

Most people crack with multiple GPUs. For example, I have a 5 GPU (3080s) rig that I used for mining ETH but now can use to crack with hashcat. tl;dr crack fast af boiii.

galeaspablo · on Nov 1, 2022

How could I match an incoming unhashed value to an existing salted hash?

m4jor · on Nov 1, 2022

hashcat