This is an excerpt from an article I wrote for Servo Magazine called Random Bits. Throughout new media work we depend on randomness to seed experiments ranging from unconventional computing to computer graphics. Rarely do we stop to contemplate just what this concept of “randomness” actually means, how a microcontroller implements it, how this differs from a computer, and how pseudo-randomness differs from “true” randomness. I delve into these details briefly below.
First, a little history. The concept of chance appears in most ancient cultures, dating back at least as far as the ancient Egyptians use of the “talus”, an early predecessor to the die. Fashioned from the knuckle or heel bone of a hoofed animal, the talus had four possible outcomes. It has been found alongside tomb illustrations and scoreboards in Egyptian sites lending support to the idea that playing dice and gambling were popular pastimes.
In the words of Ian Hacking, a historian of probability, “It is hard to find a place where people use no randomizers yet theories of frequency, betting, randomness and probability appear only recently. No one knows why.”
An intriguing mystery! And indeed it is not until 1654 that what we know as probability today began to take shape. French nobleman Chevalier de Mere had a gambling problem. He wanted to know if it would be profitable to bet that double-sixes would appear at least once in a set of 24 dice throws. Luckily for him he was a friend of one of the most brilliant mathematicians of his day, Blaise Pascal. De Mere posed the problem to Pascal and the theory of probability emerged from the ensuing correspondence between Pascal and his good friend Pierre de Fermat, another brilliant mathematician.
What Fermat and Pascal realized is what we all learned in grammar school, that with each throw of the dice each possible outcome is equally likely. The possibility of throwing a six is therefore 1/6. They further realized that probabilities could be multiplied, and the probability of throwing two sixes was 1/6 x 1/6 = 1/36. The probability therefore of throwing two sixes in 24 throws is 1/36 x 1/36 x 1/36 … 24 times or 0.4914. So it was not advisable for Chevaler de Mere to bet on throwing two sixes within 24 throws after all, and probability theory was born!
So what does any of this probability stuff have to do with random number generation?
Games of chance are, as Hacking called them “randomizers”. They are a way of abdicating responsibility for decision making to the world of probabilities. In turn, results returned from such activities are random outcomes. At any given moment of the game you cannot predict with certainty which number will appear next. And a set of random outcomes produces a sequence of random numbers containing no repeating patterns, a sequence which Algorithmic Information Theory would call uncompressable.
AIT and Compressibility
As Gregory Chaitin, founder of algorithmic information theory describes it, compression occurs if data can be reproduced by a computer program with a smaller number of bits than the original data contains. In other words, if I have a data set of 500 1-byte measurements the data contains 500 * 8 = 4000 bits. If I can find a repeating pattern in that data I can simplify it by creating a simpler symbol to represent each repeating subset of the sequence. I can then code an algorithm to process the input string and replace the symbols based on a key. In this way I can shrink the size of the data itself without losing any of the original information.
70, 15, 111, 32, 56, 70, 15, 70, 15, 7, 1, 3, 5, 20, 67, 54, 42, 29, 113, 7, 1, 3, 5
a, 111, 32, 56, a, a, b, 20, 67, 54, 42, 29, 113, b
a = 70, 15
b = 7, 1, 3, 5
The more you can compress the data, the more patterned or predictable it is, the less information it contains and the less random the sequence of numbers is. By contrast, if the data cannot be compressed at all, “if the smallest program for calculating it is just as large as it is… then the data is lawless, unstructured, patternless, not amenable to scientific study, incomprehensible. In a word, random, irreducible!” (Chaitin, p. 64)
So randomness is equivalent to information. The more information a sequence contains, the more difficult it is to compress, and the closer it is to randomness. In this light, white noise becomes pure information and a perfect source, in addition to games of chance, for generating random sequences. Anywhere noise is present, for example radio waves and the atmosphere, we can tap into it to extract random sets of numbers.
Something that we thankfully, have rare occasion to contemplate, radiation is an excellent source of random numbers. Radioactivity can be defined as “The spontaneous emission of radiation from the nucleus of an unstable atom. As a result of this emission, the radioactive atom is converted, or decays, into an atom of a different element that might or might not be radioactive” (Defined by the Radiation Emergency Assistance Center). For our purposes this means that given an unstable atom, there is no way of predicting when it will lose its extra electron resulting in the radioactive decay. By proxy, if you have multiple radioactive atoms, there is also no way of predicting the interval between decays, and it is exactly this combination of random periods that can be used to generate random strings of binary.
For more details on radioactive decay as a source of randomness check out Hotbits, a company that supplies random number sequences sourced from radioactive behavior via the internet.
So, we know that randomness comes from dice throws, card games, coin tosses, noise and radioactive decay, and I think we can safely bet that our computers are not striking up a game of cards each time we request a random number, so how do they do it?
rand() and srand()
Most likely if you are reading this blog post you have also done at least a little programming and have encountered some version of the rand() function or random class. Almost every contemporary computer language and platform has had some variant of this, from micrcontroller C to Java to Python. These functions generate what are known as pseudo-random sequences of numbers. Why pseudo-random? Well, for one thing the random number generator in your computer is actually just an equation.
Most computer languages, including C and Java, contain a version of the classic algorithm, the linear congruential generator as the source behind rand(). A linear congruential generator works by computing each successive random number from the previous, starting with a seed, Xo. The seed is generally what you supply to the algorithm by calling srand(seed_value) before requesting a number from rand().
Here is the formula:
Xn+1 = (aXn + c) mod m
Where Xn is the output set of random numbers
m is the modulus value
a is the multiplier value
c is the increment value and X0 is the initial seed value (ex. supplied by srand)
In Ansi C for example m = 2^32, a = 1103515245 and c = 12345. The number returned by the rand function is actually drawn from only bits 30:16 of the output result of the function.
Similarly, Java uses the linear congruential generator with a value of 25,214,903,917 and a c value of 11.
One of the defining characteristics of pseudo random number generators is that given the same seed they will always produce exactly the same series of numbers as output. This is very useful when you need a sequence of data that appears random and is the same each time, for example in rendering computer graphics, but it is an annoyance for most of the applications we have implemented in this column.
The other problem with linear congruential generators is that they are not terribly random. The longest random sequence it is possible to generate is the length of the chosen modulus value (2^32 in C) before it begins to repeat from the beginning, a characteristic referred to as the period of the generator. Additionally it has a characteristic called serial correlation which means that there are predictable patterns in the data. To return to our discussion of algorithmic information theory above, this means that the data could be compressed, it is not 100% random.
In comparison, the Python programming language uses a different random number generating algorithm known as the Mersenne Twister. This algorithm has been proven to generate significantly more random data (the period is 2^19937 – 1) but it requires more memory for implementation and is therefore not suitable for microcontroller or space intensive implementations.
What about microcontrollers?
CCS reports their rand function to be the following:
unsigned int16 rand(void)
_Randseed = _Randseed * 1103515245 + 12345;
return ((unsigned int16)(_Randseed >> 16) % RAND_MAX);
This is clearly derived from the ANSI C example we looked at above, as you will recognize the linear congruential function, simply separated into two steps. Notice also that the a and c values are identical to the ones above, and the only difference between the two functions is that in the CCS version m is the user defined value RAND_MAX rather than 2^32. Note that this makes explicit the relationship between the maximum period of the random number generator RAND_MAX and the modulus value m.
Hitec did not respond to my query about their random number generation technique, but it is probably safe to assume it is similar.
Full article including an more detailed discussion of microcontroller techniques available from Servo Magazine.
suggestions for further reading
Gregory Chaitin, Meta Math!
Ian Hacking, The Emergence of Probability: a philosophical Study of early Ideas about Probability, Induction and Statistical Inference
Peter J. Bentley, The Book of Numbers: The Secret of Numbers and How they Changed the World