论文信息 - Forecast aggregation via recalibration - 字舞流文

Forecast aggregation via recalibration

It is known that the average of many forecasts about a future event tends to outperform the individual assessments. With the goal of further improving forecast performance, this paper develops and compares a number of models for calibrating and aggregating forecasts that exploit the well-known fact that individuals exhibit systematic biases during judgment and elicitation. All of the models recalibrate judgments or mean judgments via a two-parameter calibration function, and differ in terms of whether (1) the calibration function is applied before or after the averaging, (2) averaging is done in probability or log-odds space, and (3) individual differences are captured via hierarchical modeling. Of the non-hierarchical models, the one that first recalibrates the individual judgments and then averages them in log-odds is the best relative to simple averaging, with 26.7 % improvement in Brier score and better performance on 86 % of the individual problems. The hierarchical version of this model does slightly better in terms of mean Brier score (28.2 %) and slightly worse in terms of individual problems (85 %).

Brandon M. Turner | Mark Steyvers | David V. Budescu | Edgar C. Merkle | Thomas S. Wallsten | M. Steyvers | D. Budescu | T. Wallsten | E. Merkle

[1] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[2] Michael H. Birnbaum,et al. Violations of Branch Independence in Choices between Gambles , 1996 .

[3] Ayleen Wisudha,et al. Distribution of probability assessments for almanac and future event questions , 1982 .

[4] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[5] Jerry Nedelman,et al. Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[6] Jeremy E. Oakley,et al. Uncertain Judgements: Eliciting Experts' Probabilities , 2006 .

[7] Michael D. Lee,et al. A Survey of Model Evaluation Approaches With a Tutorial on Hierarchical Bayesian Methods , 2008, Cogn. Sci..

[8] J. Scott Armstrong,et al. Principles of forecasting , 2001 .

[9] R. Clemen. Combining forecasts: A review and annotated bibliography , 1989 .

[10] B. Silverman. Density estimation for statistics and data analysis , 1986 .

[11] Jay J.J. Christensen-Szalanski,et al. Physicians' use of probabilistic information in a real clinical setting. , 1981 .

[12] G. Keren. Calibration and probability judgements: Conceptual and methodological issues , 1991 .

[13] David V. Budescu,et al. Encoding subjective probabilities: A psychological and psychometric review , 1983 .

[14] David B. Dunson,et al. Bayesian Data Analysis , 2010 .

[15] A. H. Murphy,et al. Probability Forecasting in Meteorology , 1984 .

[16] H Gu,et al. The effects of averaging subjective probability estimates between and within judges. , 2000, Journal of experimental psychology. Applied.

[17] C. D. Kemp,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[18] Uday S. Karmarkar,et al. Subjectively weighted utility: A descriptive extension of the expected utility model , 1978 .

[19] B. Fischhoff,et al. Calibration of probabilities: the state of the art to 1980 , 1982 .

[20] Michael D. Lee,et al. A Model-Based Approach to Measuring Expertise in Ranking Tasks , 2011, CogSci.

[21] F E Harrell,et al. The Covariance Decomposition of the Probability Score and Its Use in Evaluating Prognostic Estimates , 1995, Medical decision making : an international journal of the Society for Medical Decision Making.

[22] Lyle Brenner,et al. Overconfidence in Probability and Frequency Judgments: A Critical Examination , 1996 .

[23] George Wright,et al. Changes in the realism and distribution of probability assessments as a function of question type , 1982 .

[24] W. Härdle,et al. Applied Nonparametric Regression , 1991 .

[25] J. Frank Yates,et al. Judgment and Decision Making , 1990 .

[26] Robert T. Clemen,et al. Calibration and the aggregation of probabilities , 1986 .

[27] A. Tversky,et al. Prospect theory: analysis of decision under risk , 1979 .

[28] Martyn Plummer,et al. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[29] Scott D. Brown,et al. Detecting and predicting changes , 2009, Cognitive Psychology.

[30] Pernille Hemmer,et al. The Wisdom of Crowds with Informative Priors , 2010 .

[31] M. Lee,et al. Wisdom of the Crowds in Minimum Spanning Tree Problems , 2010 .

[32] R. Cooke. Experts in Uncertainty: Opinion and Subjective Probability in Science , 1991 .

[33] Ronald Christensen,et al. Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians , 2010 .

[34] Rick P. Thomas,et al. Diagnostic hypothesis generation and human judgment. , 2008, Psychological review.

[35] Richard Gonzalez,et al. Curvature of the Probability Weighting Function , 1996 .

[36] David V. Budescu,et al. A model-based approach for the analysis of the calibration of probability judgments , 2011, Judgment and Decision Making.

[37] S. Broomell,et al. Pair-wise comparisons of multiple models , 2011, Judgment and Decision Making.

[38] A. H. Murphy. A New Vector Partition of the Probability Score , 1973 .

[39] Claudia González-Vallejo,et al. Statement Verification: A Stochastic Model of Judgment and Response. , 1994 .

[40] A. Tversky,et al. The weighing of evidence and the determinants of confidence , 1992, Cognitive Psychology.

[41] A. Tversky,et al. Weighing Risk and Uncertainty , 1995 .

[42] Edgar C Merkle,et al. An application of the poisson race model to confidence calibration. , 2006, Journal of experimental psychology. General.

[43] Blanca Moreno,et al. Combining economic forecasts through information measures , 2007 .

[44] Pernille Hemmer,et al. The Wisdom of Crowds in the Recollection of Order Information , 2009, NIPS.

[45] A. Tversky,et al. Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[46] T. Zandt. ROC curves and confidence judgments in recognition memory. , 2000 .

[47] G. Brier,et al. External correspondence: Decompositions of the mean probability score , 1982 .

[48] Padraic Monaghan,et al. Proceedings of the 23rd annual conference of the cognitive science society , 2001 .

[49] Edgar C. Merkle,et al. Calibrating Subjective Probabilities Using Hierarchical Bayesian Models , 2010, SBP.

[50] I. Erev,et al. Simultaneous Over- and Underconfidence: The Role of Error in Judgment Processes. , 1994 .

[51] M TODA. MEASUREMENT OF SUBJECTIVE PROBABILITY DISTRIBUTIONS. TECHN DOCUM REP ESD-TDR-63-407. , 1963, Technical documentary report. United States. Air Force. Systems Command. Electronic Systems Division.

[52] Hang Zhang,et al. Ubiquitous Log Odds: A Common Representation of Probability and Frequency Distortion in Perception, Action, and Cognition , 2012, Front. Neurosci..

[53] A. H. Murphy,et al. Credible Interval Temperature Forecasting: Some Experimental Results , 1974 .

[54] Timothy J. Pleskac,et al. Two-stage dynamic signal detection: a theory of choice, decision time, and confidence. , 2010, Psychological review.

[55] W. Ferrell,et al. The Hard-Easy Effect in Subjective Probability Calibration , 1996 .

[56] Michael Vitale,et al. The Wisdom of Crowds , 2015, Cell.

[57] F. Galton. Vox Populi , 1907, Nature.

[58] G. Brier. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[59] Stephen C. Hora,et al. Probability Judgments for Continuous Quantities: Linear Combinations and Calibration , 2004, Manag. Sci..

[60] Colin Camerer. Judgment and decision making, J. Frank Yates. Englewood Cliffs, New Jersey, Prentice-Hall inc. 1990 , 1991 .

[61] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[62] A. Tversky,et al. Prospect theory: an analysis of decision under risk — Source link , 2007 .

[63] A. Diederich,et al. Evaluating and Combining Subjective Probability Estimates , 1997 .

[64] M. Lee. Three case studies in the Bayesian analysis of cognitive models , 2008, Psychonomic bulletin & review.

[65] Mark Steyvers,et al. The Wisdom of Crowds with Communication , 2011, CogSci.

[66] A. Tversky,et al. Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[67] 熊谷ユリヤ,et al. James Surowiecki, 『The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations』, Random House, 5,2004, $24.95 , 2005 .

[68] Richard Gonzalez,et al. On the Shape of the Probability Weighting Function , 1999, Cognitive Psychology.

[69] Jack B. Soll. Determinants of Overconfidence and Miscalibration: The Roles of Random Error and Ecological Structure☆ , 1996 .

[70] Yaron Shlomi,et al. Subjective recalibration of advisors' probability estimates , 2010, Psychonomic Bulletin & Review.

[71] Colin Camerer,et al. Violations of the betweenness axiom and nonlinearity in probability , 1994 .

[72] R. Catrambone,et al. Proceedings of the 32nd Annual Conference of the Cognitive Science Society , 2010 .

[73] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[74] Wolfgang Härdle,et al. Applied Nonparametric Regression , 1991 .

[75] Michael Smithson,et al. Hierarchical models of simple mechanisms underlying confidence in decision making , 2011 .

[76] Thomas S Wallsten,et al. A stochastic detection and retrieval model for the study of metacognition. , 2012, Psychological review.

[77] William R. Ferrell,et al. A model of calibration for subjective probabilities , 1980 .

[78] Lionel Page,et al. Do Prediction Markets Produce Well�?Calibrated Probability Forecasts? , 2013 .

[79] P. Juslin,et al. Thurstonian and Brunswikian origins of uncertainty in judgment: a sampling model of confidence in sensory discrimination. , 1997, Psychological review.

[80] Winston R. Sieck,et al. The Aggregative Contingent Estimation System: Selecting, Rewarding, and Training Experts in a Wisdom of Crowds Approach to Forecasting , 2012, AAAI Spring Symposium: Wisdom of the Crowd.

[81] Wray L. Buntine. Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..