java.lang.Random falls "mainly in the planes"
The java.util.Random class uses an algorithm called a
Linear Congruential Generator.
As mathematics researcher George Marsaglia famously described1,
one of the flaws of this algorithm is that the random numbers it generates
"fall mainly in the planes". This essentially means that, among other flaws, a generator such as java.lang.Random
is less suited to generating random tuples or combinations of different values.
Falling in the planes
The problem manifests itself when we pull out series of numbers at a time
from the generator. For reasons that will become clear, the problem is easiest to
visualise with series of 3 numbers. Imagine we have a cube, which we will fill by
plotting random points. We repeatedly take 3 numbers from our random nunmber generator,
which we use for the x, y and z coordinates of the point.
Doing this enough times, we'd eventually expect to fill the volume of the cube,
given a couple of extra conditions2. But when we use a LCG, however large
the period, what we actually find is that all of the points generated lie on a limited
number of equally-spaced "planes" intersecting the cube. If you're not used to thinking mathematically
about "planes", imagine a number of blades cutting through a block of cheese.
Writing a computer program to plot our points on a cube, we end up with something
such as the following:
"Random" points plotted on a
cube using the infamous
RANDU algorithm.
Now, the seriousness of this problem depends on the parameters of the LCG
(the values of a, c and m in the LCG equation
introduced on the previous page). For illustration, we picked a particularly
bad combination: the RANDU generator, something of a legend in how not to
implement a random numbe rgenerator, and unfortunately used in various mainframes
and scientific experiments until at least some time in the 1970s3.
But even though this is pretty much the worst example there is, all LCG random
number generators— including java.lang.Random— suffer from essentially the same problem,
just on a smaller scale. With some slightly judicious mathematics, Marsaglia showed that
the maximum number of planes depends on the modulus of the LCG (248
in the case of java.lang.Random, 231 in the case of RANDU),
and on the number of dimensions. (The problem is easier to visualise in 2 or 3 dimensions,
but mathematically, it extrapolates to any number of dimensions.) The bigger the modulus and
the smaller the number of dimensions, the better:
maximum planes = (n! m) 1 / n
where n is the number of dimensions, and m is the modulus. (The !
is the factorial operator, so 3! is 1x2x3=6, 4! is 1x2x3x4=24 etc.) For Java's
java.lang.Random modulus of 248, this gives a maximum of just over
119,000 planes4. This may not sound too bad, but it is something in
the order of 0.003% of the theoretical maximum5.
Now you might be thinking, "well what the hell, I'm not using java.lang.Random to
plot graphics". But see the illustration above as a graphical illustration of
a mathematical problem: picking series of numbers from Random or
any generator using the LCG algorithm will always introduce artefacts into
the combinations of numbers produced. For example, generating strings of a certain
length with successive calls to nextInt() will always produce strings with a subtle
correlation between the characters in the string. Imagine then using these strings as test
data for a hash table, and you might artificially get far more collisions than expected
by chance, due to biases in the combinations of characters that the random number generator
produced. So in general:
Avoid
java.lang.Random (and any LCG) where you need to generate
very random combinations of values. A common alternative in Java is
ThreadLocalRandom.
Further reading
Another significant flaw with LCGs is that the randomness of generated bits varies
according to the bit position.
See also:
1. Marsaglia, G., (1968), "Random Numbers Fall Mainly in the Planes",
Proceedings of the National Academy of Sciences, 61:25-28.
2. We assume that the dimension of a 'point' is 1, and that the cube has dimensions of
n bits.
The generator's period would then have to be at least 23n for all
possible combinations of x, y and z to be created. If each
coordinate has the magnitude of an int (32 bits), this isn't actually possible
with java.lang.Random's period of 248.
3. A slightly sobering thought is that there are
possibly planes flying today that involved the RANDU generator somewhere in their design process...
4. I assume that the implementers of java.lang.Random chose the
other parameters in order to maximise the number of planes.
5. For a back-of-an-envelope calculation, imagine that the generator could produce all points
in the cube (not in fact the case, of course: there are 296 distinct 3D points with integer coordinates,
and java.lang.Random will produce a maximum of 248 distinct combinations).
And imagine that the planes lie orthogonal to the cube (in fact they won't, thus actually increasing the
number of planes that intersect with the cube). Then, with integer coordinates,
there'd be theoretically a maximum of 232 planes. Marsaglia gives a more accurate calculation
for a 232 generator, which comes out at a very similar percentage.
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.