Forecasting the Accuracy of Forecasters from Properties of Forecasting Rationales

Geopolitical forecasting tournaments have stimulated the development of methods for improving probability judgments of real-world events. But these innovations have focused on easier-to quantify variables, like personnel selection, training, teaming, and crowd aggregation—and bypassed messier constructs, like qualitative properties of forecasters’ rationales. Here we adapt methods from natural language processing (NLP) and computational text analysis to identify distinctive reasoning strategies in the rationales of top forecasters, including: (a) cognitive styles, such as dialectical complexity, that gauge tolerance of clashing perspectives and efforts to blend them into coherent conclusions; (b) the use of comparison classes or base rates to inform forecasts; (c) metrics derived from the Linguistic Inquiry and Word Count (LIWC) program. Applying these tools to multiple forecasting tournaments and to forecasters of widely varying skill (from Mechanical Turkers to carefully culled “superforecasters”) revealed that: (a) top forecasters show higher dialectical complexity in their rationales, use more comparison classes, and offer more past-focused rationales; (b) experimental interventions, like training and teaming, that boost accuracy also influence NLP profiles of rationales, nudging them in a “superforecaster-like” direction.

[1]  David V. Budescu,et al.  An IRT forecasting model: linking proper scoring rules to item response theory , 2017, Judgment and Decision Making.

[2]  Peter Suedfeld,et al.  Individual Differences in Information Processing , 2007 .

[3]  L. Ungar,et al.  Small Steps to Accuracy: Incremental Belief Updaters Are Better Forecasters , 2020, EC.

[4]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[5]  Eduard Hovy,et al.  Measuring Forecasting Skill from Text , 2020, ACL.

[6]  G. Gaus,et al.  Expert Political Judgment: How Good Is It? How Can We Know? , 2007, Perspectives on Politics.

[7]  Kathrene R. Conway,et al.  Validating automated integrative complexity: Natural language processing and the Donald Trump Test , 2020 .

[8]  Don A. Moore,et al.  Confidence Calibration in a Multiyear Geopolitical Forecasting Competition , 2017, Manag. Sci..

[9]  Jonathan Baron,et al.  Combining multiple probability predictions using a simple logit model , 2014 .

[10]  P. Suedfeld,et al.  Integrative Complexity at Forty: Steps Toward Resolving the Scoring Dilemma , 2014 .

[11]  Ilan Yaniv,et al.  Measures of Discrimination Skill in Probabilistic Judgment , 1991 .

[12]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[13]  Jonathan Baron,et al.  Two Reasons to Make Aggregated Probability Forecasts More Extreme , 2014, Decis. Anal..

[14]  Robert L. Winkler,et al.  Sensitivity to Distance and Baseline Distributions in Forecast Evaluation , 2009, Manag. Sci..

[15]  Philip Tetlock,et al.  The psychology of intelligence analysis: drivers of prediction accuracy in world politics. , 2015, Journal of experimental psychology. Applied.

[16]  Ville Satopää,et al.  Bias, Information, Noise: The BIN Model of Forecasting , 2020, Manag. Sci..

[17]  Peter Suedfeld,et al.  Integrative Complexity of Communications in International Crises , 1977 .

[18]  Eva Chen,et al.  Validating the Contribution-Weighted Model: Robustness and Cost-Benefit Analyses , 2016, Decis. Anal..

[19]  Margaret E. Roberts,et al.  What Makes Foreign Policy Teams Tick: Explaining Variation in Group Performance at Geopolitical Forecasting , 2019, The Journal of Politics.

[20]  J. Scott Armstrong,et al.  Principles of forecasting , 2001 .

[21]  Philip E. Tetlock,et al.  Forecasting Tournaments , 2014 .

[22]  Clionadh Raleigh,et al.  Introducing ACLED: An Armed Conflict Location and Event Dataset , 2010 .

[23]  Lucian Gideon Conway,et al.  Automated Integrative Complexity , 2014 .

[24]  P. Tetlock,et al.  Churchill's Cognitive and Rhetorical Style: The Debates Over Nazi Intentions and Self-Government for India , 1996 .

[25]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[26]  Jaime Ramos,et al.  Robust forecast aggregation: Fourier L2E regression , 2018 .

[27]  B. Mellers,et al.  Accountability and adaptive performance under uncertainty: A long-term view , 2017, Judgment and Decision Making.

[28]  J. I. Kim,et al.  Accountability and judgment processes in a personality prediction task. , 1987, Journal of personality and social psychology.

[29]  R. Zeckhauser,et al.  The Value of Precision in Probability Assessment: Evidence from a Large-Scale Geopolitical Forecasting Tournament , 2018 .

[30]  Sydney E. Scott,et al.  Psychological Strategies for Winning a Geopolitical Forecasting Tournament , 2014, Psychological science.

[31]  Robert Jervis,et al.  System Effects: Complexity in Political and Social Life , 1997 .

[32]  David R. Mandel,et al.  How generalizable is good judgment? A multi-task, multi-benchmark study , 2017 .

[33]  N. McGlynn Thinking fast and slow. , 2014, Australian veterinary journal.

[34]  P. Tetlock,et al.  Accounting for the effects of accountability. , 1999, Psychological bulletin.

[35]  J. Pennebaker,et al.  Examining long-term trends in politics and culture through language of political leaders and cultural institutions , 2019, Proceedings of the National Academy of Sciences.

[36]  Lisa Werner,et al.  Principles of forecasting: A handbook for researchers and practitioners , 2002 .

[37]  H. M. Schroder,et al.  Human Information Processing: Individuals and Groups Functioning in Complex Situations , 1970 .

[38]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[39]  Timothy D. Wilson,et al.  Telling more than we can know: Verbal reports on mental processes. , 1977 .

[40]  J. Scott Armstrong,et al.  The Forecasting Canon: Nine Generalizations to Improve Forecast Accuracy , 2005 .

[41]  Mark Steyvers,et al.  Choosing a Strictly Proper Scoring Rule , 2013, Decis. Anal..

[42]  D. Mandel,et al.  Geopolitical Forecasting Skill in Strategic Intelligence , 2018 .

[43]  Lyle H. Ungar,et al.  Assessing Objective Recommendation Quality through Political Forecasting , 2017, EMNLP.

[44]  Philip E. Tetlock,et al.  Superforecasting: The Art and Science of Prediction , 2015 .

[45]  Ian S. Lustick,et al.  The simulation manifesto: The limits of brute‐force empiricism in geopolitical forecasting , 2021, FUTURES & FORESIGHT SCIENCE.

[46]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[47]  Philip E. Tetlock,et al.  Developing expert political judgment: The impact of training and practice on judgmental accuracy in geopolitical forecasting tournaments , 2016, Judgment and Decision Making.

[48]  A. Graesser,et al.  Pronoun Use Reflects Standings in Social Hierarchies , 2014 .

[49]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[50]  Lucian Gideon Conway,et al.  Two ways to be complex and why they matter: implications for attitude strength and lying. , 2008, Journal of personality and social psychology.

[51]  Lucian Gideon Conway,et al.  Automated Integrative Complexity: Current Challenges and Future Directions , 2014 .

[52]  Tim Chartier,et al.  The Model Thinker: What You Need to Know to Make Data Work for You , 2019, Math Horizons.