Database Administration
sql-server sql-server-2014 sql-server-2005 random
Updated Mon, 25 Jul 2022 22:10:36 GMT

How can I get satisfactory random numbers from the RAND function (or elsewhere)?


I'm creating a pseudo-random data set for application users to train on.

I'm surprised that if I seed the RAND() function with 1, 2, 3, etc., I get very nearly the same result from the seeded function. However, this seems to be followed by "properly random" but repeatable values when a seed is not supplied.

SELECT   RAND(1) AS R1A, RAND() AS R1B, RAND(2) AS R2A, RAND() AS R2B,
         RAND(3) AS R3A, RAND() AS R3B, RAND(4) AS R4A, RAND() AS R4B
0.713591993212924
0.472241415009636  
0.713610626184182  
0.217821139260039  
0.71362925915544  
0.963400850719992  
0.713647892126698  
0.708980575436056

At first sight, it appears that I can evaluate RAND(@seed) and discard the result, then evaluate RAND() to get several really "random" numbers for my training data - so far I've planned to use four per record; I might need some more.

Is that plan going to work properly? And, what am I looking at, here? And, should it be in the documentation? I haven't found it.

Documentation says this, which may be meant as a clue:

The RAND function is a pseudorandom number generator that operates in a manner similar to the C run-time library rand function. If no seed is provided, the system generates its own variable seed numbers.

Does the rand function in C produce similar output for similar seed input?

I think the documentation could also state more clearly that RAND(@number) followed by RAND() always generates the same numbers. But that is what I wanted and what any experienced computer programmer will expect.

I suppose I could fill a table with random data keys obtained from https://www.random.org/ to use for the purpose - but that has drawbacks.

Update, provisional conclusion

I have the following conclusions about RAND() and for now I think I'll continue with it, but keeping alternatives in mind.

RAND(@int) sets the seed of the random number generator using the integer value given, and returns a float result which is not statistically independent, inasmuch as RAND(@int) and RAND(@int+1) produce almost exactly the same result.

Of course, RAND(@int) always produces the same result.

RAND(-@int) and RAND(@int) produce the same result.

RAND(0) is an exception: there may be other exceptions. RAND(0) always produces the result 0.943597390424144, but it is not similar to the result of RAND(1).

RAND() called n times after RAND(@int) always produces the same n numbers. If we call the nth number "rand(@int, @n)" -

CREATE PROCEDURE sproc_rand(@seed int, @nth int, @rand float OUTPUT) AS
SET @rand = RAND(@seed);
WHILE ( @nth > 0 ) BEGIN SET @rand = RAND(); SET @nth = @nth - 1; END

The difference between rand(@int, @n) and rand(@int+1, @n) "modulo 1" -

(1.0 + rand(@int, @n) - rand(@int+1, @n) % 1

Is a constant or nearly constant; for @n = 1 it is approximately 0.75. For @n = 5 it is 0.991. For @n = 6 it is 0.91. For @n = 100 it is 0.83.

So, no, these are not good "random" numbers when a simple incremented seed is used - although they bounce around quite well for @n IN (1, 2, 3, 4).

What I'm now considering as a mitigation is using not RAND(row_id) but,

RAND(row_id * @factor_1 + @factor_2)

where @factor1 and @factor2 are constant terms and @factor_1 is approximately 10,000. And for different projects, different factors.

So far, if I want repeatable results, the other method available (besides the Multiplicative Inverse thing, which I haven't quite understood yet) is to create a table and fill it with non-repeating random numbers from one of the other methods, then draw the numbers from that table, when required.




Solution

The only reason to ever provide a "seed" for RAND() is if you want the exact same sequence of (pseudo) random values for testing purposes.

Just call it without a seed, and that should be good enough for most purposes.

If you do need a predictable sequence, and you're simply unhappy with the starting (random) value, RAND() takes seeds from -2,147,483,648 all the way up to 2,147,483,647 (full range of int), so try some seeds that differ by more than 1 or 2.

Finally, if you were on SQL 2008 or better, you'd also have the option of considering CRYPT_GEN_RANDOM instead (warning, it returns a varbinary, not a float, so you'd likely need to change your code a bit): see this article for some differences and discussion.





Comments (3)

  • +0 – Oops! This is in fact on SQL Server 2014 although our main development server is SQL Server 2005 - lowest common denominator - and the next stage of the project may be back on the SS2005 server as well. I forgot. I wonder how CRYPT_GEN_RANDOM behaves if the seed is 1, 2, 3, ... Something to try on Monday. — Jun 09, 2017 at 19:56  
  • +0 – @RobertCarnegie cryptgenrandom doesn't have any seed. — Jun 10, 2017 at 16:49  
  • +1 – @"Martin Smith" [does that work?] It does take a seed - but the result returned isn't deterministically controlled by the seed. e.g. SELECT CRYPT_GEN_RANDOM (4, 0x01020304) SQL doc page doesn't explain that but Windows doc does: "If an application has access to a good random source, it can fill the pbBuffer buffer with some random data before calling CryptGenRandom. The CSP then uses this data to further randomize its internal seed." It occurs to me this may be an intentionally garbled way of saying "one-time pad" instead of "seed". Only the CIA know for sure. :-) — Jun 12, 2017 at 09:22