If I wanted to create a TOTP-esque algorithm that generated a string n
characters long, with each character being a base64 character, generated from a user secret r
Base64 characters long, every s
seconds, for each of u
number of users, where the probability of collision is very small, how would I calculate the length of the string that was needed?
In case my wording is bad (I'm new to this), here's an example: Say it's generated every 30 seconds from the user's secret, I have 100,000,000 users, the algorithm output is 8 characters long, and the secret for each user is a 32 character Base64 string randomly generated when the user signed up. What would the probability be that two users got the same algorithm output as each other in any one 30s* period? How would I figure out how many characters I would need to make it a one in a million chance?
* Like TOTP, for all users, the 30s period would be the same, i.e. between seconds 0-29 or 30-59 on the clock.
If I wanted to create a TOTP-esque algorithm that generated a string
n
characters long, with each character being a base64 character, generated from a user secretr
Base64 characters long, everys
seconds, for each ofu
number of users, where the probability of collision is very small, how would I calculate the length of the string that was needed?
Well, the probability that two specific users get the same r
secret is $2^{-6r}$; there are $u(u-1)/2$ distinct pairs of users, hence the probability that there exists a pair somewhere with the same secret is no more than $2^{-6r} u(u-1)/2$ - with the values of $r, u$ you mentioned ($r=32, u=100,000,000$), this is tiny (circa $2^{-138.8}$, and so we can safely ignore that.
Now, the probability that, during a specific s
second time interval, that two specific users get the same n
character sequence is $2^{-6n}$; again, there are $u(u-1)/2$ distinct pairs of users, hence the probability that there exists a pair somewhere with the same output is no more than $2^{-6n} u(u-1)/2$
Now, you suggest $n=8$, if you plug that into the above equation with $u=100,000,000$, you get a value larger than 1 (hence that provides no meaningful bound on the collision probability, and in practice means that there likely exists several collisions).
To deduce what $n$ would need to be to make the probability bounded by $10^{-6}$, we would need to solve the problem $10^{-6} \ge 2^{-6n} u(u-1)/2$; solving this numerically for $u=100,000,000$ gives $n \ge 12.014...$; that is, an $n$ of 12 isn't quite enough to give you the probability you're looking for (but it's close).
Now, if you want that $10^{-6}$ probability to span over $k$ intervals (that is, that there is a collision anywhere in the $k$ $s$-second span, you'd need to solve $10^{-6} \ge k \cdot 2^{-6n} u(u-1)/2$