- hash pseudo-random-generator sha-1 md5 probability
- Updated Thu, 16 Jun 2022 03:16:46 GMT

Using MD5 or SHA1 for instance, and applying integers (as seed so to speak) to the hash function, in sequence, and only keeping, say, the first 64 bits of the resulting hash, do we always have a probability close to $1/{2^{64}}$ to have $$\text{hash}_{64}(n) = \text{hash}_{64}(n+1)$$ for every $n\in\big[1,2^{64}]$?

Caveat: that answer is written with my mind in Vulcan mode. That is, I'm answering the question as taken literally (in the first part); or in only a minor variant (second part); and ignoring the question's title, which I find unrelated, especially given the limited span of $n$.

The probability considered is $0$ for each of the many different common ways to define $\text{hash}_{64}(n)$ as the first 64 bits of the MD5 or SHA-1 of integer $n$. Whether $0$ is *"close to $1/2^{64}$"* is a matter of opinion.

For a few other definitions of $\text{hash}_{64}(n)$, that probability is $1$, which is not *"close to $1/2^{64}$"* by many accounts.

Justification: it is considered the probability $p$ that for *every* $n$ in $\big[1,2^{64}\big]$ it holds $\text{hash}_{64}(n)=\text{hash}_{64}(n+1)$.

For fixed function $\text{hash}_{64}$ this is a probability that a certain determined fact holds or not, and thus $p$ is either $0$ or $1$ depending on the function $\text{hash}_{64}$. This also holds (perhaps with a different $p$) if we change *every* to *some*.

For the many different common ways to define the function $\text{hash}_{64}$ for an integer parameter as the first 64 bits of the MD5 or SHA-1 of that integer, that probability is $0$. For example, if we define $\text{hash}_{64}$ to be the first 64 bits of the SHA-1 of the shortest big-endian decimal representation of $n$ encoded in ASCII, then $\text{hash}_{64}(1)$ is the SHA-1 of the single byte 0x31, that is b65898410c_{h}, while $\text{hash}_{64}(2)$ is the SHA-1 of the single byte 0x32, that is da4b0b0c_{h}, thus they do not match.

We could try with MD5 and SHA-1, various common bases (including 2, 8, 10, 16, 64, 256, 2^{32}), common suitable alphabets (ASCII, EBCDIC, binary, bytes) and common variants (uppercase or lowercase hex, various base64 idioms), endianness (big, little, various mixed), and width for $n$ (fixed to say 64 with some exception for $n=2^{64}$, 65, 72 or 96 bits, adaptive according to $n$ with various steps like even which is common with hex). But I'm not convinced that we would find any such $\text{hash}_{64}$ with $\text{hash}_{64}(1)=\text{hash}_{64}(2)$. And I'm ready to bet the house $\text{hash}_{64}(2)=\text{hash}_{64}(3)$ won't hold in addition.

When we average the probability asked over all the possible definitions of $\text{hash}_{64}$, we get $2^{\left(64-2^{70}\right)}$. That's a ludicrously small number, yet not zero.

Now perhaps we should instead consider the probability $p$ that for a random $n$ in $\big[1,2^{64}\big]$ it holds $\text{hash}_{64}(n)=\text{hash}_{64}(n+1)$.

That probability $p$ is $k/{2^{64}}$ for some integer $k$, which can be computed with significant but feasible effort from the exact definition of $\text{hash}_{64}$.

When we average $p$ over all the possible definitions of $\text{hash}_{64}$, we get $1/2^{64}$; thus *"close to $1/2^{64}$"* for sure.

We can also define the probability $p_k$ that $p$ is $k/{2^{64}}$ for a random function $\text{hash}_{64}$, which is a plausible model for the various $\text{hash}_{64}$ that the question considers. The largest $p_k$ is $p_1$ (meaning $p=1/{2^{64}}$ is the most likely), with $p_0$ extremely close second, both extremely close to $1/e\approx36.8\%$. $p_2$ is about half that, and $p_k$ goes down fast as $k$ grows, down to $p_{2^{64}}=2^{\left(-2^{70}\right)}$, then $p_k=0$ for $k>2^{64}$. It holds $1=\sum p_k$.

We see that $p$ is in the 3-element set $\{0,1/2^{64},1/2^{63}\}$ with probability about $5/2e\approx92.0\%$. We can say $p$ is very often *"close to $1/2^{64}$"*.

External links referenced by this document: