Notes

Chapter 7: Mechanisms in Programs and Nature

Section 5: The Intrinsic Generation of Randomness


Random number generators

A fairly small number of different types of random number generators have been used in practice, so it is possible to describe all the major ones here.

Linear congruential generators.

The original suggestion made by Derrick Lehmer in 1948 was to take a number n and at each step to replace it by Mod[a n, m]. Lehmer used a = 23 and m = 10^8 + 1. Most subsequent implementations have used m = 2^j, often with j = 31. Such choices are particularly convenient on computers where machine integers are represented by 32 binary digits. The behavior of the linear congruential generator depends greatly on the exact choice of a. Starting with the so-called RANDU generator used on mainframe computers in the 1960s, a common choice made was a = 65539. But as shown in the main text, this choice leads to embarrassingly obvious regularities. Starting in the mid-1970s, another common choice was a = 69069. This was also found to lead to regularities, but only in six or more dimensions. (Small values of a also lead to an excess of runs of identical digits, as mentioned on page 903.)

The repetition period for a generator with rule n -> Mod[a n, m] is given (for a and m relatively prime) by MultiplicativeOrder[a, m]. If m is of the form 2^j, this implies a maximum period for any a of m/4, achieved when MemberQ[{3, 5}, Mod[a, 8]]. In general the maximum period is CarmichaelLambda[m], where the value m-1 can be achieved for prime m.

As illustrated in the main text, when m=2^j the right-hand base 2 digits in numbers produced by linear congruential generators repeat with short periods; a digit k positions from the right will typically repeat with period no more than 2^k. When m=2^j-1 is prime, however, even the rightmost digit repeats only with period m-1 for many values of a.

More general linear congruential generators use the basic rule n -> Mod[a n + b, m], and in this case, n = 0 is no longer special, and a repetition period of exactly m can be achieved with appropriate choices of a, b and m. Note that if the period is equal to its absolute maximum of m, then every possible n is always visited, whatever n one starts from. Page 962 showed diagrams that represent the evolution for all possible starting values of n.

Each point in the 2D plots in the main text has coordinates of the form {n[i], n[i+1]} where n[i+1] = Mod[a n[i], m]. If one could ignore the Mod, then the coordinates would simply be {n[i], a n[i]}, so the points would lie on a single straight line with slope a. But the presence of the Mod takes the points off this line whenever a n[i] >= m. Nevertheless, if a is small, there are long runs of n[i] for which the Mod is never important. And that is why in the case a = 3 the points in the plot fall on obvious lines.

In the case a = 65539, the points lie on planes in 3D. The reason for this is that

n[i+2] == Mod[65539^2 n[i], 2^31] == Mod[6 n[i+1] - 9 n[i], 2^31]

so that in computing n[i+2] from n[i+1] and n[i] only small coefficients are involved.

It is a general result related to finding short vectors in lattices that for some d the quantity n[i+d] can always be written in terms of the n[i+k]/; k<d using only small coefficients. And as a consequence, the points produced by any linear congruential generator must lie on regular hyperplanes in some number of dimensions.

(For cryptanalysis of linear congruential generators see page 1089.)

Linear feedback shift registers.

Used since the 1950s, particularly in special-purpose electronic devices, these systems are effectively based on running additive cellular automata such as rule 60 in registers with a limited number of cells and with a certain type of spiral boundary conditions. In a typical case, each cell is updated using

LFSRStep[list_] :=Append[Rest[list], Mod[list[[1]] + list[[2]], 2]]

with a step of cellular automaton evolution corresponding to the result of updating all cells in the register. As with additive cellular automata, the behavior obtained depends greatly on the length n of the register. The maximal repetition period of 2^n-1 can be achieved only if Factor[1+x+x^n, Modulus->2] finds no factors. (For n < 512, this is true when n = 1, 2, 3, 4, 6, 7, 9, 15, 22, 28, 30, 46, 60, 63, 127, 153, 172, 303 or 471. Maximal period is assured when in addition PrimeQ[2^n-1].) The pictures below show the evolution obtained for n = 30 with

NestList[Nest[LFSRStep, #, n]&, Append[Table[0, {n-1}], 1], t]

Like additive cellular automata as discussed on page 951, states in a linear feedback shift register can be represented by a polynomial FromDigits[list, x]. Starting from a single 1, the state after t steps is then given by

PolynomialMod[x^t, {1 + x + x^n, 2}]

This result illustrates the analogy with linear congruential generators. And if the distribution of points generated is studied with the Cantor set geometry, the same kind of problems occur as in the linear congruential case (compare page 1094).

In general, linear feedback shift registers can have "taps" at any list of positions on the register, so that their evolution is given by

LFSRStep[taps_List, list_] := Append[Rest[list], Mod[Apply[Plus, list[[taps]]], 2]]

(With taps specified by the positions of 1's in a vector of 0's, the inside of the Mod can be replaced by vec . list as on page 1087.) For a register of size n the maximal period of 2^n-1 is obtained whenever x^n + Apply[Plus, x^(taps - 1)] is one of the EulerPhi[2^n - 1]/n primitive polynomials that appear in Factor[Cyclotomic[2^n - 1, x], Modulus -> 2]. (See pages 963 and 1084.)

One can also consider nonlinear feedback shift registers, as discussed on page 1088.

Generalized Fibonacci generators.

It was suggested in the late 1950s that the Fibonacci sequence f[n_] := f[n-1] + f[n-2] modulo 2^k might be used with different choices of f[0] and f[1] as a random number generator (see page 891). This particular idea did not work well, but generalizations based on the recurrence f[n_] :=Mod[f[n - p] + f[n - q], 2^k] have been studied extensively, for example with p = 24, q = 55. Such generators are directly related to linear feedback shift registers, since with a list of length q, each step is simply

Append[Rest[list], Mod[list[[1]] + list[[q - p + 1]], 2^k]]

Cryptographic generators.

As discussed on page 598, so-called stream cipher cryptographic systems work essentially by generating a repeatable random sequence. Practical stream cipher systems can thus be used as random number generators. Starting in the 1980s, the most common example has been the Data Encryption Standard (DES) introduced by the U.S. government (see page 1085). Unless special-purpose hardware is used, however, this method has not usually been efficient enough for practical random number generation applications.

Quadratic congruential generators.

Several generalizations of linear congruential generators have been considered in which nonlinear functions of n are used at each step. In fact, the first known generator for digital computers was John von Neumann's "middle square method"

n->FromDigits[Take[IntegerDigits[n^2, 10, 20], {5, 15}], 10]

In practice this generator has too short a repetition period to be useful. But in the early 1980s studies of public key cryptographic systems based on number theoretical problems led to some reinvestigation of quadratic congruential generators. The simplest example uses the rule

n -> Mod[n^2, m]

It was shown that for m = p q with p and q prime the sequence Mod[n, 2] was in a sense as difficult to predict as the number m is to factor (see page 1090). But in practice, the period of the generator in such cases is usually too short to be useful. In addition, there has been the practical problem that if n is stored on a computer as a 32-bit number, then n^2 can be 64 bits long, and so cannot be stored in the same way. In general, the period divides CarmichaelLambda[CarmichaelLambda[m]]. When m is a prime, this implies that the period can then be as long as (m-3)/2. The largest m less than 2^16 for which this is true is 65063, and the sequence generated in this case appears to be fairly random.

Cellular automaton generators.

I invented the rule 30 cellular automaton random number generator in 1985. Since that time the generator has become quite widely used for a variety of applications. Essentially all the other generators discussed here have certain linearity properties which allow for fairly complete analysis using traditional mathematical methods. Rule 30 has no such properties. Empirical studies, however, suggest that the repetition period, for example, is about 2^0.63 n, where n is the number of cells (see page 260). Note that rule 45 can be used as an alternative to rule 30. It has a somewhat longer period, but does not mix up nearby initial conditions as quickly as rule 30. (See also page 603.)


From Stephen Wolfram: A New Kind of Science [citation]