This blog covers the general topic of financial markets.

Superforecasting: the Good Judgment Project

first posted: 2022-10-01 08:27:44.642261

Philip Tetlock is a political science professor at UPenn. He embarked on a project to assess the accuracy of judgment in high-stakes situations.


Accountability in Judgment

The media continuously asks pundits to air opinionated takes on political and economic matters with no accountability or follow-up. The statements are crafted to convey emotions and interpretations while being sufficiently vague to escape accuracy checking.

Forecasting pundits are part of the great mainstream media circus. Measuring the accuracy of their statements was never the intention and it could only detract from the show.

Tetlock's experience with the Intelligence Community (CIA analysts, ...) is that they also dislike being checked, as they feel they have much to lose and little to gain from the process.

As no WMD were found in Iraq despite the IC being "certain" that the Iraqi had developed WMD, the IC started to report probabilities rather than certainties. For instance, the finding of Bin Laden's compound in Pakistan was assigned a 70% probability when reported to president Obama.

The Good Judgement Project was about recruiting forecasters from the public and assessing their performance.

Good Judgment Project

Good judgment starts with making forecasts that can be called as being unambiguous right or wrong. As it turns out, assigning probabilities to events rather than the all-in or all-out calls that led to the Iraq WMD intelligence failure.

While forecast accuracy cannot be measured from one forecast, the quality of forecasts can be assessed over many predictions.Tetlock advises using the Brier score, a sum square error loss function, for this purpose. The Brier score can be decomposed into 3 components

  • reliability a $\sum_k (\bar{f_k}-\bar{o}_k)^2$ term identifying the error on each forecasts
  • resolution: a $\sum_k (\bar{o}_k-\bar{o})^2$ term identifying the differentiation of forecasts
  • uncertainty: a $\bar{o}(1-\bar{o})$ term capturing the basic uncertainty of a bernouli trial

The resolution negatively affects the Brier score.

It should be noted that log-based loss functions are also popular in statistical learning. The log criteria penalize more wrong extreme certainty.

What happened with GJP

  • forecasters aggregate performed well once the forecasts were exrtremised
  • some superforecasters did perform very well on their own
  • teams of superforecasters did even better
  • superforecasters were numerate and able to use feedback to improve their Brier score.

The limitations of GJP

  • GJP questions were generally 3 month to 2 years in advance
  • GJP requires continuous update and news junkies
  • real Important events such as the Arab Spring, or World War I had random catalysts that were unforecastable

Ten Commandments for Better Forecasts

The following prescriptions improve forecasts by 10%.

  • Triage: avoid wasting time on too easy and too cloudy forecasts
  • Break down intractable into tractable problems (Fermi estimates)
  • balance inside and outside views: use estimates using different domains if transposition applies
  • balance under and overreacting to news: key is finding the right precursor indicator to update beliefs
  • look at the clashing causal forces at work
  • distinguish as many degrees of doubt as the problem permits but no more
  • balance in confidence between prudence and decisiveness
  • look at errors after mistakes, keep notes to avoid hindsight bias
  • get the best of others, let others get the best from you
  • error balancing is something one gets through practice

Good Judgement Dashboard

This dashboard presents few estimates:

  • taiwan/china kinetic conflict in 2022 at 6% probability
  • Putin stop being Russia president in 2022 at 3% probability.