Superforecasting Will Become the Next Big Thing Unless It Doesn’t

Forecasts That Don’t Mean Much

Comedian George Carlin had a character called Al Sleet, otherwise known as the Hippie-Dippie Weatherman. Sleet’s speciality was vague or ambiguous weather forecasts, and his most famous was:

Tonight’s forecast: Dark. Continued mostly dark tonight, changing to widely scattered light towards morning.

Everyone can find this funny, but it takes a very special kind of intellect like Philip Tetlock to take it seriously. Why seriously? Because Tetlock has spent much of his career studying forecasting, why we get it wrong (as we often do), and whether we can get it right (which, with discipline, we sometimes can).

A forecast

He has just come out with a new book, Superforecasting: The Art & Science of Prediction, which I am going to discuss in just a second.

But first, we have to talk about dart-throwing chimpanzees.

Tetlock became famous for a book he published in 2005, Expert Political Judgment: How Good Is It? How Can We Know?. It was the culmination of about 20 years worth of work, much of which was just plain scientific hard slog. He wanted to know how accurate professional political pundits and experts are when they make predictions, so he did a radical thing.

He asked them to make predictions.

Then he waited, and did the math, to see how many of these came true.

And there was a lot of math to do, because this was a big and comprehensive study. Tetlock assembled over 280 professional forecasters, and got them to make more than 82,000 predictions.

His simple finding was the experts didn’t do any better than chance, or, as Tetlock put it colourfully, “a dart-throwing chimpanzee”. Such a conclusion would have made for a pretty short book, but Tetlock put quite a bit of time and thought into teasing out the details as to why they did so poorly.

And the basic, albeit indirect, answer is: They produce forecasts that don’t mean much.

Mushy Language

The biggest single issue is a kind of insidious, self-supporting lack of precision. Tetlock talks about the mushy kind of language typically used by professional forecasters. Their’s is the world of “may” and “might” forecasts, of “could”s and “can”s. Using such language, it is almost impossible to be proven wrong.

In his new book, Superforecasting, Tetlock writes…

Virtually every political pundit on the planet operates under the same tacit ground rules. They make countless claims about what lies ahead but couch their claims in such vague verbiage that it is impossible to test them. How should we interpret intriguing claims like ‘expansion of NATO could trigger a ferocious response from the Russian bear and may even lead to a new Cold War’ or ‘the Arab Spring might signal that the days of unaccountable autocracy in the Arab world are numbered’ or…? The key terms in these semantic dances, may or could or might, are not accompanied by guidance on how to interpret them.”

– Superforecasting, pg 291
(Emphasis and ellipses by the author)

Such forecasts also tend to lack a precise time frame. This of course works to the forecaster’s advantage. Imagine you predict that failure to reduce the national debt might result in a financial crisis of Grecian proportions, but then nothing happens for a year. When someone points out the apparent failure of your forecast, you can wiggle off the hook easily with a dismissive, “Yeah, not yet“. Leave a prediction long enough, and it will either come true or be forgotten. The forecaster wins in both cases.

The net result:

“All this makes it impossible to track accuracy across time and questions. It also gives pundits endless flexibility to claim credit when something happens (I told you it could) and to dodge blame when it does not (I merely said it could happen). We shall encounter many examples of such linguistic mischief.”

– Ibid, pg 291

A Failure to Evaluate Accuracy

As I mentioned, this lack of precision is self-supporting. So it leads to the second half of the problem.

A forecast that is hard to pin down is also hard to evaluate. So we simply never bother to tally up successful versus failed forecasts, leaving no scientific way to decide whether the expert is expert or not. High-profile pundits are free to remind the world of the things they got right (post facto, and in the process morphing a lot of “might”s into “will”s), while allowing their “not yet”s to simmer or sink into obscurity.

This leaves an evaluative vacuum which we tend to fill with other, far less useful metrics.

Today’s professional forecasters — not just TV’s talking heads, but national security advisers, insider political strategists, and so on — tend to be evaluated more on how they sound than what they say. The most successful forecasters (measured not by whether they get it right, but how in-demand they are) actually manage to combine mushy forecasts with an aura of certainty. So the “successful” forecasters are the ones who shout their “might”s the loudest.

(As an aside, we are seeing a similar phenomenon in a closely-related field, politics itself: the current Republican presidential nomination process has been notable for the paucity of quality debate on substantial issues. So who fills that void: the person who shouts the loudest.)

Ironically, Tetlock discovered in his original study, that there was in inverse correlation between the confidence of the forecaster, and his/her success rate. The more sure they were, the less they got it right. It is not hard to see why.

Hedgehogs and Foxes

Forecasting is difficult, particularly in chaotic fields like politics. A huge number of small, diverse events contribute to the end result. Tracking them all is really challenging, so many take a reductionist approach, distilling a multi-faceted problem down to what they consider to be the one key issue. Tetlock (borrowing from a much-cited essay by Isaiah Berlin entitled The Hedgehog and the Fox) calls these people hedgehogs, because they know one big thing. They are ideologues, using an intellectual framework to guide their thinking. Because of their reductionism, such people tend to be deadly certain — whether they are right or not.

Conversely, there are the foxes: forecasters of a less ideological bent, who see the world as a very complex, interrelated organism. In order to produce their forecasts, they do a balancing act, evaluating multivariate sources of information. A fox breaks the question down into as many component parts as possible, and looks at the probabilities of each. Foxes spend a lot of time doing “on the one hand/on the other hand” type analysis. On the whole, this tends to make them better informed, but less cocksure of themselves (or at least more suspicious of black and white answers).

One would think that, given these characteristics, hedgehogs would produce the most unambiguous forecasts. They are sure of themselves; foxes less so. Paradoxically, Tetlock has discovered this is not how things play out. Regarding the comparative success rates between the two, he writes:

“Foxes beat hedgehogs. And the foxes didn’t just win by acting like chickens, playing it safe with 60% and 70% forecasts where hedgehogs boldly went with 90% and 100%. Foxes beat hedgehogs on both calibration and resolution. Foxes had real foresight. Hedgehogs didn’t.”

– Ibid
page 69

The reason is that the very best forecasts are the ones that acknowledge the inherent uncertainty of the problem. It is just downright wrong to be absolutely certain about something that is fundamentally uncertain. But this doesn’t mean the fox needs to descend into mushy language. The trick is to state the question precisely, and create a probability estimate as to whether it will or won’t come true.

Not only does this allow the fox to embrace precise language and strict time frames, it allows her to acknowledge her justified uncertainty by offering a reasoned probability rather than a (naively) sure response.

The hedgehog, on the other hand, relies on vague phrasing to give his forecasts the semblance of certainty. But certainty in an uncertain world is a trap, regardless of how good it sounds at the time.

Heads or Tails? Is Not a Yes or No Question

Let’s work through an extreme example: Take two forecasters: a headstrong hedgehog and a dyed-in-the-fur fox. And a fair coin. Flip it 6 times. But beforehand, pose the following question:

Will the coin come up heads all six times?

Coin toss

It is tempting to say No, which is what the hedgehog does. The odds are just too slim that you’ll get 6 heads in a row. In fact, the odds are 0.015625 (or exactly 1 in 64). So the hedgehog firmly says No. The fox, on the other hand, replies “All 6 heads with a probability of 0.015625”.

On first glance, the hedgehog’s response is more satisfying; you feel you can have more confidence in it. The fox, on the other hand, (to mix animal metaphors) seems to be weaseling out of answering entirely.

But from Tetlock’s point of view, the hedgehog’s response shows a bad lack of precision. If we break the question down, we can see why. Let’s do that by asking another question, again, before the coin has even been flipped:

Will the first flip of the coin come up heads?

Suddenly, it is not at all easy to be certain. In fact, it is impossible. Any hedgehog who thinks he has a Yes-No answer to this question is deluding himself, even if his forecast subsequently turns out to be right. The right odds are 0.5, or 1 in 2. Neither a Yes nor a No are appropriate.

Statistically, the hedgehog is now in something of a contradictory position, because he claims to be absolutely certain about the outcome of a complex series of events — 6 coin flips — yet has to admit he is completely uncertain about the first of its constituent parts. Where does his certainty come from?

Let’s imagine that first flip comes up heads. We ask the hedgehog and the fox our next question, which is a simple repetition of the first:

Will the coin come up heads all six times?

Here the hedgehog either has to hold or fold. He claimed certainty for all 6 flips, when he must have known at the outset that the first coin toss could easily have come up heads; so he really has no room to revise his forecast without admitting he was just plain wrong at the outset. The question now is effectively whether the next 5 coin tosses will all come up heads, the odds of which are 0.03125, or 1 in 32. These are still pretty long, odds, and they might serve to save the hedgehog. Unable to change his mind without losing face, the hedgehog answers with the same resounding No.

The fox, on the other hand, can revise her position. Her first forecast was a probability statement, the facts have changed, and with them the probability, so she is happy to revise. She now revises her forecast to “All heads with a probability of 0.03125.

Revision, Revision, Revision

Good forecasters revise repeatedly. This does not represent spinelessness, but a commitment to accuracy. Tetlock cites the (now considered apocryphal) quote by John Maynard Keynes:

“When the facts change, I change my mind. What do you do, sir?”

The coin toss scenario makes clear why this must be good advice.

Let’s fast-forward to the final round of forecasts. Imagine 5 of the 6 coins have been flipped, and all have come up heads. We repeat the first question.

Now the hedgehog is really squirming, as the same principle that forced him to reaffirm his original dead-cert forecast after the first flip has applied throughout. At no time has he been able to change his mind without admitting he must have been wrong. So he has to stick with his No. Our fox, on the other hand, has been revising constantly as new data has come in, and now is perfectly free to revise to “All heads with a probability of 0.5”.

At the risk of leaving the reader with a cliff-hanger, there is actually no need to flip the last coin. At this stage, the fox is already right, and has been right all along. The hedgehog is down to a guess; well, not down to a guess, because in fact he has been guessing all along.

Imprecision and misplaced certainty are the fatal double-whammy for hedgehog forecasters. Together they represent the major reason why people making predictions get it wrong.

In the second part of this posting, we’ll find out how to get it right (a certain percentage of the time).

Wonders and Deceptions

Things That Amaze and Disappoint… …by Brad Varey