Using random numbers for simulations: Random.nextGaussian()

The random numbers that we've been generating so far have all had a linear distribution. In other words, if we call say, nextInt(1000), every integer between 0 and 999 inclusive will have a roughly equal chance of being returned. In many cases, this is precisely what we want. But in some cases, we want to use a random number generator to simulate an event where the possible values/outcomes don't have equal probability of occurring. This situation crops up frequently in testing and simulation applications.

For example, imagine we have some server program that receives packets (byte arrays) over the network and passes them off to different threads, which stick them together into a "command" and execute them. Networks being what they are, we don't always get a whole "command" in one go. So, threads will sometimes be woken up with a packet, add it to their queue, then go back to sleep; at other times, they'll we woken up with the "last" packet in their queue and will execute the function in question, thus hogging a processor for longer. In other words, we have an essentially "complex" situation with some randomness in what happens. Now, our problem is that we want to find out how our server will scale in the future— say, if it received twice or ten times the volume of packets per second.

One way we can do this is to run a simulation: we write a method that, with some frequency that we want to test, makes up random packets of data and injects them into our server's receivePacket() method. We know that in real life, the packets are "random" in size and arrive at "random" intervals, so we also want to simulate this randomness. As a first attempt, we could measure the average length of a packet and average number of milliseconds between packets as they occur in real life. Then we can write a simulation that picks random numbers around this range, but (say) halves the average number of milliseconds between packets. To pick the random numbers, we could use nextInt().

The main problem with this approach is that nextInt() does not simulate how values differ from the average. For example, we might measure the average number of milliseconds between packets to be around 500. But calling nextInt(1000) to simulate this average duration is unrealistic. The sequences (100, 800, 200, 900) and (450, 550, 580, 420) both have an average duration of 500; but common sense (or measurement of the network behaviour) tells us that the second sequence is much more likely in practice: observed durations tend to cluster around the average. In other words, durations don't occur with equal likelihood.

nextGaussian()

This is where the Random class's nextGaussian method comes in. It generates random numbers that "cluster naturally" around an average. Mathematically, it creates random numbers with a normal distribution.


If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.