Defining randomness

What does it mean to say that something is "random"? While recently reading Taleb's book "Fooled by Randomness", it struck me that even experts on the subject don't seem to provide a meaningful definition of the word.

Throughout the book, when Taleb isn't dropping not-so-subtle hints about his intellectual superiority and the deep, fulfilling existence he enjoys in our all-too-materialistic world (which is ironic given he now spends his days insulting people on Twitter), he often writes about the concept of "alternative histories". That is, ways in which the world could have turned out otherwise than it actually has. While on the surface this idea seems intuitive, it doesn't mesh with a deterministic worldview: how could an alternative history have arisen? What would have been different? If you view the world as deterministic, alternative histories are not possible.

Similarly, in "Superforecasters", Phillip Tetlock also seems to like the idea of alternative histories. Tetlock, when discussing a forecast by Jonathan Schell regarding the possibility of a future nuclear war, writes (p.53) "the only way to settle this definitively would be to rerun history hundreds of times, and if civilization ends in piles of irradiated rubble in most of those reruns, we would know Schell was right..". The problem, however, is this: through what mechanism could the world have turned out differently than it did? That is, what is causing these different outcomes to be realized? Tetlock and Taleb both seem to be saying that "randomness" is the answer to that question.

I think what both Tetlock and Taleb actually mean in using the phrase "alternative histories", is not that history could have literally unfolded differently than it has, but that before the present history was realized there were many possible alternative histories consistent with our information at each point in time along the way. More precisely, an alternative history, from say time t to the present, is any possible history of the world (different from the realized one) that is consistent with the realized state of the world up to time t. There is only one history that can be realized (if we think the world is deterministic), we just don't know what it is before it happens.

Therefore, we should think of randomness as synonymous with a lack of information or a lack of predictability. A random event is one whose outcome cannot be predicted with certainty. Thus any event that cannot be perfectly predicted can be said to be random, and less predictable events are therefore "more random". Under this definition, something that is random to you may not be random to me (if I am in possession of information that allows me to know the result of the event with certainty).

Consider the statement Donald Trump has a 29% chance of winning the U.S. Presidential election. What does this mean? Using Tetlock's interpretation, this means that if we were to live out the days between the time of this forecast and the time of the election many times, in 29% of them Trump would win the election. That is, Trump winning the election was a "29% event".

This is clearly absurd. If we were travel back in time and re-live the fall of 2016 many times, Trump would win every time. How could it not be so? If you disagree, feel free to specify what will cause a different election result to be realized. Even if you don't view the world as deterministic (apparently quantum physics says otherwise, although I wouldn't know), there are events, such as the US presidential election, that just cannot be easily fit into this framework where re-running history results in different outcomes. The uncertainty, or randomness, of the US election results is due to lack of information, not the type of "intrinsic randomness" people discuss in quantum physics.

A more reasonable interpretation of the 29% Trump forecast is the following: given the information available to us at the time of the forecast, the outcome "Trump wins the US Presidential election" occurs in 29% of all possible worlds consistent with our information at the time of the forecast. This information would include things such as recent polling results and historical election outcomes. Given that Clinton was polling more favourably than Trump, it obviously makes sense that Trump's win probability would be less than 50% - that is, there are more possible states of the world in which Clinton wins than worlds in which she doesn't, given that we are restricting the set of worlds to those where Clinton is polling favourably at the time of the forecast. You could think of each world as being identical except for the election vote distribution; then, it should make sense that there will be more worlds with a vote distribution that results in a Clinton win than a Trump win, given that the polls favoured Clinton at the time of the prediction we are considering (which should rule out more of the landslide-Trump-win worlds).

This is a Bayesian interpretation of probability: it is just a statement of belief about the future state of the world. Another reasonable interpretation of the 29% Trump event is a frequentist one: out of all elections for which we had the same information set as our information set at the time of the Trump forecast, 29% of these elections resulted in the lower polling candidate (e.g. Trump in 2016) winning. This also seems reasonable, given that we can hold this frequentist view of probability even with a worldview in which everything is deterministic.

The frequentist interpretation can be more naturally applied to an experimental setting such as flipping a coin. First, we specify the information set: there will be a human doing the flipping and the coin will be landing on a flat surface. The probability of Heads is then defined as the long-run frequency of Heads from repeated experiments involving a human flipping a coin onto a flat surface. The world is still deterministic: for any given flip, it was always going to be Heads or Tails. If we could know the position of every particle that can affect the outcome of the flip, then we would know with certainty whether it will land Heads or Tails. It doesn't seem to make sense to call flipping a coin a "50% event" any more than it does to call Trump's election a "29% event"; the probabilities of these events reflect our ignorance about the future state of the world, nothing more.

Under this view, it may be difficult to talk about "true" or "objective" probabilities. The definition that makes the most sense to me is to define the true probability of an event as the "best" (e.g. most accurate) possible prediction that could be made given the information set. How you define "best possible" is obviously ambiguous; we could think of using predictions from an information-aggregating machine that is built to form the best possible predictions. The frequentist interpretation also seems pretty useful here; if we can collect data on a sufficiently large number of past events that had the same information set to the event under consideration, then the long-run frequency will be the true probability of the event. The fact that a sufficiently large reference set may not exist in most contexts would be problematic; how do we define the true probability of an event that is the first of its kind? For this it seems the Bayesian interpretation will be more useful.

To wrap up, a random event is one that cannot be perfectly predicted, and so randomness can be simply defined as a lack of predictability. Elections where you only poll 10% of the population beforehand will be more random than elections where you poll 100% of the population beforehand. Coin flips where a human flips the coin will be more random than coin flips where a machine designed to favour Heads is doing the flipping.

Of course, all of this is up for debate. Also, these aren't very original thoughts, but are rather the product of reading various Wikipedia articles on interpreting probability and procrastinating doing real work by talking with my economics friends about why Nate Silver doesn't understand probability.

matt courchene

personal webpage

Defining randomness