Bayesian Ensembles of Binary-Event Forecasts: When Is It Appropriate to Extremize or Anti-Extremize?

Many organizations face critical decisions that rely on forecasts of binary events---events such as whether a borrower will default on a loan or not. In these situations, organizations often gather forecasts from multiple experts or models. This raises the question of how to aggregate the forecasts. Because linear combinations of probability forecasts are known to be underconfident, we introduce a class of aggregation rules, or Bayesian ensembles, that are non-linear in the experts' probabilities. These ensembles are generalized additive models of experts' probabilities. These models have three key properties. They are coherent, i.e., consistent with the Bayesian view. They can aggregate calibrated (or miscalibrated) forecasts. And they are often more extreme, and therefore more confident, than the commonly used linear opinion pool. Empirically, we demonstrate that our ensemble can be easily fit to real data using a generalized linear modeling framework. We use this framework to aggregate several forecasts of binary events in two publicly available datasets. The forecasts come from several leading statistical and machine learning algorithms. Our Bayesian ensemble offers an improvement out-of-sample over the linear opinion pool and over any one of the individual algorithms considered.

[1]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[2]  T. Gneiting,et al.  Combining probability forecasts , 2010 .

[3]  Howard Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[4]  Angelo M. Mineo,et al.  A Software Tool for the Exponential Power Distribution: The normalp Package , 2005 .

[5]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[6]  Ian T. Jolliffe,et al.  Comments on: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds , 2008 .

[7]  Uday S. Karmarkar,et al.  Subjectively weighted utility: A descriptive extension of the expected utility model , 1978 .

[8]  Simon French,et al.  Updating of Belief in the Light of Someone Else's Opinion , 1980 .

[9]  A. Tversky,et al.  On the Reconciliation of Probability Assessments , 1979 .

[10]  M. Fuentes Comments on: Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds , 2008 .

[11]  Richard J. Zeckhauser Combining Overlapping Information , 1971 .

[12]  M. Degroot Reaching a Consensus , 1974 .

[13]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[14]  R. L. Winkler,et al.  Coherent combination of experts' opinions , 1995 .

[15]  M. Degroot,et al.  Optimal linear opinion pools , 1991 .

[16]  Peter Grünwald,et al.  Using Stacking to Average Bayesian Predictive Distributions (with Discussion) , 2018 .

[17]  Brandon M. Turner,et al.  Forecast aggregation via recalibration , 2014, Machine Learning.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[20]  Francis Galton F.R.S. IV. Statistics by intercomparison, with remarks on the law of frequency of error , 1875 .

[21]  Mohamed-Slim Alouini,et al.  New results on the sum of two generalized Gaussian random variables , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[22]  Robert L. Winkler,et al.  Combining Interval Forecasts , 2016, Decis. Anal..

[23]  Stephen C. Hora,et al.  Probability Judgments for Continuous Quantities: Linear Combinations and Calibration , 2004, Manag. Sci..

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Sydney E. Scott,et al.  Psychological Strategies for Winning a Geopolitical Forecasting Tournament , 2014, Psychological science.

[26]  ScienceDirect Organizational behavior and human performance , 1984 .

[27]  H Gu,et al.  The effects of averaging subjective probability estimates between and within judges. , 2000, Journal of experimental psychology. Applied.

[28]  I. Erev,et al.  Simultaneous Over- and Underconfidence: The Role of Error in Judgment Processes. , 1994 .

[29]  Yaron Shlomi,et al.  Subjective recalibration of advisors' probability estimates , 2010, Psychonomic Bulletin & Review.

[30]  Yael Grushka-Cockayne,et al.  Quantile Evaluation, Sensitivity to Bracketing, and Sharing Business Payoffs , 2016, Oper. Res..

[31]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[32]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[33]  Zhihua Zhang,et al.  EP-GIG Priors and Applications in Bayesian Sparse Learning , 2012, J. Mach. Learn. Res..

[34]  S. French,et al.  Calibration and the Expert Problem , 1986 .

[35]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[36]  Jonathan Baron,et al.  Two Reasons to Make Aggregated Probability Forecasts More Extreme , 2014, Decis. Anal..

[37]  A. Diederich,et al.  Evaluating and Combining Subjective Probability Estimates , 1997 .

[38]  Michael A. West,et al.  Bayesian Forecasting and Dynamic Models (2nd edn) , 1997, J. Oper. Res. Soc..

[39]  R. L. Winkler Evaluating probabilities: asymmetric scoring rules , 1994 .

[40]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[41]  Padhraic Smyth,et al.  Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[42]  A. H. Murphy,et al.  A General Framework for Forecast Verification , 1987 .

[43]  Bernardo A. Huberman,et al.  Eliminating Public Knowledge Biases in Information-Aggregation Mechanisms , 2004, Manag. Sci..

[44]  Robert L. Winkler,et al.  The Consensus of Subjective Probability Distributions , 1968 .

[45]  Robert L. Winkler,et al.  Evaluating and Combining Physicians' Probabilities of Survival in an Intensive Care Unit , 1993 .

[46]  Lyle H. Ungar,et al.  Modeling Probability Forecasts via Information Diversity , 2014 .

[47]  Peter A. Morris,et al.  Decision Analysis Expert Use , 1974 .

[48]  Richard P. Larrick,et al.  Intuitions About Combining Opinions: Misappreciation of the Averaging Principle , 2006, Manag. Sci..

[49]  Jonathan Baron,et al.  Combining multiple probability predictions using a simple logit model , 2014 .

[50]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[51]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[52]  M. Stone The Opinion Pool , 1961 .

[53]  R. L. Winkler,et al.  Combining Economic Forecasts , 1986 .

[54]  P. Sørensen,et al.  Forecasters’ objectives and strategies , 2013 .