 Cryptography

# Uniform rejection sampling by shifting or rotating bits from CSPRNG output, safe?

Given a CSPRNG (arc4random, using ChaCha20 keystream), one want to use rejection sampling to get random number on a smaller range avoiding modulo bias (arc4random_uniform()).

OpenBSD's arc4random_uniform() implementation draws a 32bits value and tests its fitness, rejecting values that don't fit.

It's not the only possible implementation. In Efficiently Generating a Number in a Range, M.E. O'Neill provides various other ways to generate random number within a range while avoiding the modulo bias.

In particular, there's this so called Bitmask with Rejection (Unbiased) Apple's Method. Following the link to the actual Apple's implementation, one can see it has another trick:

``````/*
* Calculate a uniformly distributed random number less than upper_bound
* avoiding "modulo bias".
*
* Uniformity is achieved by trying successive ranges of bits from the random
* value, each large enough to hold the desired upper bound, until a range
* holding a value less than the bound is found.
*/
uint32_t
arc4random_uniform(uint32_t upper_bound)
{
if (upper_bound < 2)
return 0;
// find smallest 2**n -1 >= upper_bound
int zeros = __builtin_clz(upper_bound);
int bits = CHAR_BIT * sizeof(uint32_t) - zeros;
uint32_t mask = 0xFFFFFFFFU >> zeros;

do {
uint32_t value = arc4random();
// If low 2**n-1 bits satisfy the requested condition, return result
uint32_t result = value & mask;
if (result < upper_bound) {
return result;
}
// otherwise consume remaining bits of randomness looking for a satisfactory result.
int bits_left = zeros;
while (bits_left >= bits) {
value >>= bits;
result = value & mask;
if (result < upper_bound) {
return result;
}
bits_left -= bits;
}
} while (1);
}
``````

Instead of drawing a new value if the current one doesn't fit the range, the value is shifted to discard any bits involved in the fit test, until the value fits or there's not enough unused bit left.

This is especially useful if the CSPRNG is costly, for example, if it requires a system call to generate one value, reducing the amount of time it's invoked, while algorithmically complex, might improve arc4random_uniform() overall speed.

And now my question: given a 32bits value generated by a CSPRNG, given the CSPRNG output is indiscernible from a random stream, that every value is as likely as any other, why discarding every tested bits ?

1. Is it possible to shift the value by one bit at a time and test the resulting value against the range ?
``````    do {
uint32_t value = arc4random();
for (int bits_left = 32; bits_left >= bits ; bits_left--) {
uint32_t result = value & mask;
if (result < upper_bound) {
return result;
value = value >> 1; // shift right by one bit
}
} while (1);
``````
1. Is it possible to rotate the value by one bit at a time and test the resulting value against the range ?
``````    do {
uint32_t value = arc4random();
for (int i = 0; i < 32; i++) {
uint32_t result = value & mask;
if (result < upper_bound) {
return result;
value = ror32(value, 1); // rotate right by one bit
}
} while (1);
``````

Are those optimization safe to build arc4random_uniform ?

If not, on what basis ? ## Solution

Are those optimization safe to build arc4random_uniform ?

No, they are not.

If not, on what basis ?

Because, after we test the value of `result` and reject it (that is, it is `>= upper_bound`, the bits in the tested region are not uniform; we know that those bits within value are also >= upper_bound (and so values less than that will never appear). As those bits are nonuniform, reusing them in future tests will give nonuniform results.

Let us give a simple example: suppose upper_bound = 3 (and hence mask = 3). Then, on the first iteration, if the lower two bits of value are 0, 1 or 2, we'll return that with equal probability, not an issue. However, if the lower two bits happen to be 3 (that is, the bits are 11), we won't return that iteration, and so we'll go to the next. Your ideas would shift the value right one bit, resulting in the lower two bits being x1 (where 'x' is the original bit 2 which we haven't tested yet), that is, the lower two bits will be either the value 1 or the value 3. If it's 1, that iteration will return that; otherwise it'll skip to the next iteration. In no case will the second iteration return 0 or 2 (and similarly in latter iterations); hence we have a strong bias towards 1.

The original code (which discards all the bits tested after a rejection) doesn't have this problem.

• +0 – @poncho could you get away with shifting by `popcnt(n)`? Think its only the `1` bits in `upper_bound` that cause the issue as they can end up fixing certain lower bits. But shifting up by `popcnt(n)` will always shift out as many bits as where potentially forcing trailing 1s. I.e if you have `upper_bound = 6` for `value & mask >= n` must be `1` in position 2/3 which are shifted out, but lower bit can be anything. — Aug 02, 2022 at 13:04
• +0 – @Noah: I assume you mean `popcnt(result)`; you have to be careful, you know that result is a value between upper_bound and mask, but could be any value in that range. For example, if upper_bound = 5, then result is one of 5, 6, 7 - that is, the lsbit is also biased toward 1. Now, there are algorithms that do harvest that amount of entropy to try to rely on fewer bits from the rng; however those algorithms are not cheap. I'm sure Apple knew about those fancier algorithms, but they don't save that much in most cases, and so Apple went with the overall cheaper algorithm. — Aug 02, 2022 at 13:28
• +0 – @poncho err `popcnt(upper_bound)` (although that is wrong), really its `clz(upper_bound) - ctz(upper_bound)` + using `rol`. The lower 0s in `upper_bound` are entirely unrelated to the comparison so there is no bias in keeping them around. — Aug 03, 2022 at 04:47