Cryptography
one-time-password probability
Updated Sat, 03 Sep 2022 22:04:57 GMT

What is / How would I calculate the number of bits(?) I need to make the probability of a collision very small?


If I wanted to create a TOTP-esque algorithm that generated a string n characters long, with each character being a base64 character, generated from a user secret r Base64 characters long, every s seconds, for each of u number of users, where the probability of collision is very small, how would I calculate the length of the string that was needed?

In case my wording is bad (I'm new to this), here's an example: Say it's generated every 30 seconds from the user's secret, I have 100,000,000 users, the algorithm output is 8 characters long, and the secret for each user is a 32 character Base64 string randomly generated when the user signed up. What would the probability be that two users got the same algorithm output as each other in any one 30s* period? How would I figure out how many characters I would need to make it a one in a million chance?

* Like TOTP, for all users, the 30s period would be the same, i.e. between seconds 0-29 or 30-59 on the clock.




Solution

If I wanted to create a TOTP-esque algorithm that generated a string n characters long, with each character being a base64 character, generated from a user secret r Base64 characters long, every s seconds, for each of u number of users, where the probability of collision is very small, how would I calculate the length of the string that was needed?

Well, the probability that two specific users get the same r secret is $2^{-6r}$; there are $u(u-1)/2$ distinct pairs of users, hence the probability that there exists a pair somewhere with the same secret is no more than $2^{-6r} u(u-1)/2$ - with the values of $r, u$ you mentioned ($r=32, u=100,000,000$), this is tiny (circa $2^{-138.8}$, and so we can safely ignore that.

Now, the probability that, during a specific s second time interval, that two specific users get the same n character sequence is $2^{-6n}$; again, there are $u(u-1)/2$ distinct pairs of users, hence the probability that there exists a pair somewhere with the same output is no more than $2^{-6n} u(u-1)/2$

Now, you suggest $n=8$, if you plug that into the above equation with $u=100,000,000$, you get a value larger than 1 (hence that provides no meaningful bound on the collision probability, and in practice means that there likely exists several collisions).

To deduce what $n$ would need to be to make the probability bounded by $10^{-6}$, we would need to solve the problem $10^{-6} \ge 2^{-6n} u(u-1)/2$; solving this numerically for $u=100,000,000$ gives $n \ge 12.014...$; that is, an $n$ of 12 isn't quite enough to give you the probability you're looking for (but it's close).

Now, if you want that $10^{-6}$ probability to span over $k$ intervals (that is, that there is a collision anywhere in the $k$ $s$-second span, you'd need to solve $10^{-6} \ge k \cdot 2^{-6n} u(u-1)/2$





Comments (2)

  • +0 – Thanks for the detailed answer! Can you expand on where the $2^{-6r}$ comes from please? I am trying to figure out where the in the equation the character set (Base64) for the secret and output comes into play, and I feel it should factor in here, but I don't see how it does. — Aug 15, 2019 at 21:04  
  • +1 – the probability that $k$ bits match is $(1/2)^k$ if you have $r$ base 64 characters, $k=6r$ since base 64 characters are represented with $\log_2(64)=6$ bits. So $(1/2)^k=2^{-6r}.$ — Aug 15, 2019 at 21:45