Forecast aggregation via recalibration

It is known that the average of many forecasts about a future event tends to outperform the individual assessments. With the goal of further improving forecast performance, this paper develops and compares a number of models for calibrating and aggregating forecasts that exploit the well-known fact that individuals exhibit systematic biases during judgment and elicitation. All of the models recalibrate judgments or mean judgments via a two-parameter calibration function, and differ in terms of whether (1) the calibration function is applied before or after the averaging, (2) averaging is done in probability or log-odds space, and (3) individual differences are captured via hierarchical modeling. Of the non-hierarchical models, the one that first recalibrates the individual judgments and then averages them in log-odds is the best relative to simple averaging, with 26.7 % improvement in Brier score and better performance on 86 % of the individual problems. The hierarchical version of this model does slightly better in terms of mean Brier score (28.2 %) and slightly worse in terms of individual problems (85 %).

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Michael H. Birnbaum,et al.  Violations of Branch Independence in Choices between Gambles , 1996 .

[3]  Ayleen Wisudha,et al.  Distribution of probability assessments for almanac and future event questions , 1982 .

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[6]  Jeremy E. Oakley,et al.  Uncertain Judgements: Eliciting Experts' Probabilities , 2006 .

[7]  Michael D. Lee,et al.  A Survey of Model Evaluation Approaches With a Tutorial on Hierarchical Bayesian Methods , 2008, Cogn. Sci..

[8]  J. Scott Armstrong,et al.  Principles of forecasting , 2001 .

[9]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[10]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[11]  Jay J.J. Christensen-Szalanski,et al.  Physicians' use of probabilistic information in a real clinical setting. , 1981 .

[12]  G. Keren Calibration and probability judgements: Conceptual and methodological issues , 1991 .

[13]  David V. Budescu,et al.  Encoding subjective probabilities: A psychological and psychometric review , 1983 .

[14]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[15]  A. H. Murphy,et al.  Probability Forecasting in Meteorology , 1984 .

[16]  H Gu,et al.  The effects of averaging subjective probability estimates between and within judges. , 2000, Journal of experimental psychology. Applied.

[17]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[18]  Uday S. Karmarkar,et al.  Subjectively weighted utility: A descriptive extension of the expected utility model , 1978 .

[19]  B. Fischhoff,et al.  Calibration of probabilities: the state of the art to 1980 , 1982 .

[20]  Michael D. Lee,et al.  A Model-Based Approach to Measuring Expertise in Ranking Tasks , 2011, CogSci.

[21]  F E Harrell,et al.  The Covariance Decomposition of the Probability Score and Its Use in Evaluating Prognostic Estimates , 1995, Medical decision making : an international journal of the Society for Medical Decision Making.

[22]  Lyle Brenner,et al.  Overconfidence in Probability and Frequency Judgments: A Critical Examination , 1996 .

[23]  George Wright,et al.  Changes in the realism and distribution of probability assessments as a function of question type , 1982 .

[24]  W. Härdle,et al.  Applied Nonparametric Regression , 1991 .

[25]  J. Frank Yates,et al.  Judgment and Decision Making , 1990 .

[26]  Robert T. Clemen,et al.  Calibration and the aggregation of probabilities , 1986 .

[27]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[28]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[29]  Scott D. Brown,et al.  Detecting and predicting changes , 2009, Cognitive Psychology.

[30]  Pernille Hemmer,et al.  The Wisdom of Crowds with Informative Priors , 2010 .

[31]  M. Lee,et al.  Wisdom of the Crowds in Minimum Spanning Tree Problems , 2010 .

[32]  R. Cooke Experts in Uncertainty: Opinion and Subjective Probability in Science , 1991 .

[33]  Ronald Christensen,et al.  Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians , 2010 .

[34]  Rick P. Thomas,et al.  Diagnostic hypothesis generation and human judgment. , 2008, Psychological review.

[35]  Richard Gonzalez,et al.  Curvature of the Probability Weighting Function , 1996 .

[36]  David V. Budescu,et al.  A model-based approach for the analysis of the calibration of probability judgments , 2011, Judgment and Decision Making.

[37]  S. Broomell,et al.  Pair-wise comparisons of multiple models , 2011, Judgment and Decision Making.

[38]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[39]  Claudia González-Vallejo,et al.  Statement Verification: A Stochastic Model of Judgment and Response. , 1994 .

[40]  A. Tversky,et al.  The weighing of evidence and the determinants of confidence , 1992, Cognitive Psychology.

[41]  A. Tversky,et al.  Weighing Risk and Uncertainty , 1995 .

[42]  Edgar C Merkle,et al.  An application of the poisson race model to confidence calibration. , 2006, Journal of experimental psychology. General.

[43]  Blanca Moreno,et al.  Combining economic forecasts through information measures , 2007 .

[44]  Pernille Hemmer,et al.  The Wisdom of Crowds in the Recollection of Order Information , 2009, NIPS.

[45]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[46]  T. Zandt ROC curves and confidence judgments in recognition memory. , 2000 .

[47]  G. Brier,et al.  External correspondence: Decompositions of the mean probability score , 1982 .

[48]  Padraic Monaghan,et al.  Proceedings of the 23rd annual conference of the cognitive science society , 2001 .

[49]  Edgar C. Merkle,et al.  Calibrating Subjective Probabilities Using Hierarchical Bayesian Models , 2010, SBP.

[50]  I. Erev,et al.  Simultaneous Over- and Underconfidence: The Role of Error in Judgment Processes. , 1994 .

[51]  M TODA MEASUREMENT OF SUBJECTIVE PROBABILITY DISTRIBUTIONS. TECHN DOCUM REP ESD-TDR-63-407. , 1963, Technical documentary report. United States. Air Force. Systems Command. Electronic Systems Division.

[52]  Hang Zhang,et al.  Ubiquitous Log Odds: A Common Representation of Probability and Frequency Distortion in Perception, Action, and Cognition , 2012, Front. Neurosci..

[53]  A. H. Murphy,et al.  Credible Interval Temperature Forecasting: Some Experimental Results , 1974 .

[54]  Timothy J. Pleskac,et al.  Two-stage dynamic signal detection: a theory of choice, decision time, and confidence. , 2010, Psychological review.

[55]  W. Ferrell,et al.  The Hard-Easy Effect in Subjective Probability Calibration , 1996 .

[56]  Michael Vitale,et al.  The Wisdom of Crowds , 2015, Cell.

[57]  F. Galton Vox Populi , 1907, Nature.

[58]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[59]  Stephen C. Hora,et al.  Probability Judgments for Continuous Quantities: Linear Combinations and Calibration , 2004, Manag. Sci..

[60]  Colin Camerer Judgment and decision making, J. Frank Yates. Englewood Cliffs, New Jersey, Prentice-Hall inc. 1990 , 1991 .

[61]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[62]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[63]  A. Diederich,et al.  Evaluating and Combining Subjective Probability Estimates , 1997 .

[64]  M. Lee Three case studies in the Bayesian analysis of cognitive models , 2008, Psychonomic bulletin & review.

[65]  Mark Steyvers,et al.  The Wisdom of Crowds with Communication , 2011, CogSci.

[66]  A. Tversky,et al.  Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[67]  熊谷 ユリヤ,et al.  James Surowiecki, 『The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations』, Random House, 5,2004, $24.95 , 2005 .

[68]  Richard Gonzalez,et al.  On the Shape of the Probability Weighting Function , 1999, Cognitive Psychology.

[69]  Jack B. Soll Determinants of Overconfidence and Miscalibration: The Roles of Random Error and Ecological Structure☆ , 1996 .

[70]  Yaron Shlomi,et al.  Subjective recalibration of advisors' probability estimates , 2010, Psychonomic Bulletin & Review.

[71]  Colin Camerer,et al.  Violations of the betweenness axiom and nonlinearity in probability , 1994 .

[72]  R. Catrambone,et al.  Proceedings of the 32nd Annual Conference of the Cognitive Science Society , 2010 .

[73]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[74]  Wolfgang Härdle,et al.  Applied Nonparametric Regression , 1991 .

[75]  Michael Smithson,et al.  Hierarchical models of simple mechanisms underlying confidence in decision making , 2011 .

[76]  Thomas S Wallsten,et al.  A stochastic detection and retrieval model for the study of metacognition. , 2012, Psychological review.

[77]  William R. Ferrell,et al.  A model of calibration for subjective probabilities , 1980 .

[78]  Lionel Page,et al.  Do Prediction Markets Produce Well�?Calibrated Probability Forecasts? , 2013 .

[79]  P. Juslin,et al.  Thurstonian and Brunswikian origins of uncertainty in judgment: a sampling model of confidence in sensory discrimination. , 1997, Psychological review.

[80]  Winston R. Sieck,et al.  The Aggregative Contingent Estimation System: Selecting, Rewarding, and Training Experts in a Wisdom of Crowds Approach to Forecasting , 2012, AAAI Spring Symposium: Wisdom of the Crowd.

[81]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..