Forecaster's Dilemma: Extreme Events and Forecast Evaluation

In public discussions of the quality of forecasts, attention typically focuses on the predictive performance in cases of extreme events. However, the restriction of conventional forecast evaluation methods to subsets of extreme observations has unexpected and undesired effects, and is bound to discredit skillful forecasts when the signal-to-noise ratio in the data generating process is low. Conditioning on outcomes is incompatible with the theoretical assumptions of established forecast evaluation methods, thereby confronting forecasters with what we refer to as the forecaster's dilemma. For probabilistic forecasts, proper weighted scoring rules have been proposed as decision theoretically justifiable alternatives for forecast evaluation with an emphasis on extreme events. Using theoretical arguments, simulation experiments, and a real data study on probabilistic forecasts of U.S. inflation and gross domestic product growth, we illustrate and discuss the forecaster's dilemma along with potential remedies.

[1]  T. Thorarinsdottir,et al.  Comparison of non-homogeneous regression models for probabilistic wind speed forecasting , 2013, 1305.2026.

[2]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[3]  G. Meehl,et al.  Climate extremes: observations, modeling, and impacts. , 2000, Science.

[4]  Kenneth F. Wallis,et al.  Density Forecasting: A Survey , 2000 .

[5]  Richard A. Davis,et al.  Approximating the conditional density given large observed values via a multivariate extremes framework, with application to environmental data , 2012, 1301.1428.

[6]  Halbert White,et al.  Tests of Conditional Predictive Ability , 2003 .

[7]  Gianni Amisano,et al.  Comparing Density Forecasts via Weighted Likelihood Ratio Tests , 2007 .

[8]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[9]  L Held,et al.  A Score Regression Approach to Assess Calibration of Continuous Probabilistic Predictions , 2010, Biometrics.

[10]  Tilmann Gneiting,et al.  Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings , 2015, 1503.08195.

[11]  Janet E. Heffernan,et al.  Dependence Measures for Extreme Value Analyses , 1999 .

[12]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[13]  Dick van Dijk,et al.  Likelihood-based scoring rules for comparing density forecasts in tails , 2011 .

[14]  A. Dawid The geometry of proper scoring rules , 2007 .

[15]  Bodo Ahrens,et al.  Generalization of the Ignorance Score: Continuous Ranked Version and Its Decomposition , 2012 .

[16]  Hajo Holzmann,et al.  The role of the information set for forecasting—with applications to risk management , 2014, 1404.7653.

[17]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[18]  Allan Timmermann,et al.  Density forecasting in economics and finance , 2000 .

[19]  H. Zou,et al.  Composite quantile regression and the oracle Model Selection Theory , 2008, 0806.2905.

[20]  Jon Faust,et al.  Comparing Greenbook and Reduced Form Forecasts Using a Large Realtime Dataset , 2007 .

[21]  T. Sargent,et al.  Drifts and Volatilities: Monetary Policies and Outcomes in the Post WWII U.S. , 2003 .

[22]  T. Gneiting,et al.  Comparing Density Forecasts Using Threshold- and Quantile-Weighted Scoring Rules , 2011 .

[23]  Christina Fang,et al.  Predicting the Next Big Thing: Success as a Signal of Poor Judgment , 2010, Manag. Sci..

[24]  Juha Röning,et al.  Exceedance Probability Score: A Novel Measure for Comparing Probabilistic Predictions , 2012 .

[25]  D. Romer,et al.  Federal Reserve Information and the Behavior of Interest Rates , 2000 .

[26]  Alexander Tsyplakov,et al.  Evaluation of Probabilistic Forecasts: Proper Scoring Rules and Moments , 2013 .

[27]  Eric P. Smith,et al.  An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[28]  A. Edwards The art of conjecturing together with letter to a friend on sets in court tennis , 2007 .

[29]  Nicole A. Lazar,et al.  Statistics of Extremes: Theory and Applications , 2005, Technometrics.

[30]  D. Stephenson,et al.  Extremal Dependence Indices: Improved Verification Measures for Deterministic Forecasts of Rare Binary Events , 2011 .

[31]  Paul Embrechts,et al.  Quantitative Risk Management , 2011, International Encyclopedia of Statistical Science.

[32]  R. L. Winkler,et al.  Scoring Rules for Continuous Probability Distributions , 1976 .

[33]  Paola Sebastiani,et al.  Coherent dispersion criteria for optimal experimental design , 1999 .

[34]  H. Kantz,et al.  Extreme Events in Nature and Society , 2006 .

[35]  Alexander J. McNeil,et al.  Quantitative Risk Management: Concepts, Techniques and Tools Revised edition , 2015 .

[36]  Anthony S. Tay,et al.  Evaluating Density Forecasts , 1997 .

[37]  J. Corcoran Modelling Extremal Events for Insurance and Finance , 2002 .

[38]  Gerald S. Rogers,et al.  Mathematical Statistics: A Decision Theoretic Approach , 1967 .

[39]  R. Nau Should Scoring Rules be Effective , 1985 .

[40]  Francis X. Diebold,et al.  Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold–Mariano Tests , 2012 .

[41]  C. Marzban Scalar measures of performance in rare-event situations , 1998 .

[42]  Jeremy Berkowitz Testing Density Forecasts, With Applications to Risk Management , 2001 .

[43]  Anthony C. Davison,et al.  Statistics of Extremes , 2015, International Encyclopedia of Statistical Science.

[44]  G. Gaus,et al.  Expert Political Judgment: How Good Is It? How Can We Know? , 2007, Perspectives on Politics.

[45]  Christopher A. T. Ferro,et al.  A comparison of ensemble post‐processing methods for extreme events , 2014 .

[46]  Dawit Zerom,et al.  Are Macroeconomic Variables Useful for Forecasting the Distribution of U.S. Inflation? , 2009 .

[47]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[48]  T. Gneiting Making and Evaluating Point Forecasts , 2009, 0912.0902.

[49]  Tilmann Gneiting,et al.  Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtesting , 2015, 1507.00244.

[50]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[51]  Todd E. Clark,et al.  Macroeconomic Forecasting Performance under Alternative Specifications of Time-Varying Volatility , 2015 .

[52]  A. H. Murphy,et al.  A General Framework for Forecast Verification , 1987 .

[53]  P. Tetlock Expert Political Judgment: How Good Is It? How Can We Know? , 2005 .

[54]  Tilmann Gneiting,et al.  Editorial: Probabilistic forecasting , 2008 .

[55]  C. Klüppelberg,et al.  Modelling Extremal Events , 1997 .

[56]  Linus Magnusson,et al.  Statistical evaluation of ECMWF extreme wind forecasts , 2016 .

[57]  Chaim M. Ehrman,et al.  The Forecaster's Dilemma , 1995 .

[58]  Stephen S. Hall,et al.  Scientists on trial: At fault? , 2011, Nature.

[59]  Justinas Pelenis Weighted Scoring Rules for Comparison of Density Forecasts on Subsets of Interest , 2014 .

[60]  N. McGlynn Thinking fast and slow. , 2014, Australian veterinary journal.

[61]  T. Gneiting,et al.  Combining Predictive Distributions , 2011, 1106.1638.

[62]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[63]  Mattias Villani,et al.  Forecasting Performance of an Open Economy DSGE Model , 2007 .

[64]  David B. Stephenson,et al.  The extreme dependency score: a non‐vanishing measure for forecasts of rare events , 2008 .

[65]  Johanna F. Ziegel,et al.  Cross-calibration of probabilistic forecasts , 2015 .

[66]  M. Parlange,et al.  Statistics of extremes in hydrology , 2002 .

[67]  J. Copas,et al.  Interpreting Kullback-Leibler divergence with the Neyman-Pearson lemma , 2006 .

[68]  Andrey Feuerverger,et al.  Some aspects of probability forecasting , 1992 .

[69]  Anthony S. Tay,et al.  Evaluating Density Forecasts with Applications to Financial Risk Management , 1998 .