Of quantiles and expectiles: consistent scoring functions, Choquet representations and forecast rankings

In the practice of point prediction, it is desirable that forecasters receive a directive in the form of a statistical functional. For example, forecasters might be asked to report the mean or a quantile of their predictive distributions. When evaluating and comparing competing forecasts, it is then critical that the scoring function used for these purposes be consistent for the functional at hand, in the sense that the expected score is minimized when following the directive. We show that any scoring function that is consistent for a quantile or an expectile functional can be represented as a mixture of elementary or extremal scoring functions that form a linearly parameterized family. Scoring functions for the mean value and probability forecasts of binary events constitute important examples. The extremal scoring functions admit appealing economic interpretations of quantiles and expectiles in the context of betting and investment problems. The Choquet‐type mixture representations give rise to simple checks of whether a forecast dominates another in the sense that it is preferable under any consistent scoring function. In empirical settings it suffices to compare the average scores for only a finite number of extremal elements. Plots of the average scores with respect to the extremal scoring functions, which we call Murphy diagrams, permit detailed comparisons of the relative merits of competing forecasts.

[1]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[2]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[3]  J. C. Thompson,et al.  THE ECONOMIC UTILITY OF WEATHER FORECASTS , 1955 .

[4]  R. Phelps Lectures on Choquet's Theorem , 1966 .

[5]  E. H. Shuford,et al.  Admissible probability measurement procedures , 1966, Psychometrika.

[6]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[8]  M. Degroot Optimal Statistical Decisions , 1970 .

[9]  L. J. Savage Elicitation of Personal Probabilities and Expectations , 1971 .

[10]  A. Hendrickson,et al.  Proper Scores for Probability Forecasters , 1971 .

[11]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[12]  R. L. Winkler,et al.  Scoring Rules for Continuous Probability Distributions , 1976 .

[13]  A. H. Murphy,et al.  The Value of Climatological, Categorical and Probabilistic Forecasts in the Cost-Loss Ratio Situation , 1977 .

[14]  E. Bronshtein Extremal convex functions , 1978 .

[15]  William Thomson,et al.  Eliciting production possibilities from a well-informed manager , 1979 .

[16]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[17]  David Lindley Scoring rules and the inevitability of probability , 1982 .

[18]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[19]  Stephen B. Vardeman,et al.  Calibration, sufficiency, and domination considerations for Bayesian probability assessors , 1983 .

[20]  Kent Osband,et al.  Providing incentives for better cost forecasting , 1985 .

[21]  Stefan Reichelstein,et al.  Information-eliciting compensation schemes , 1985 .

[22]  R. Nau Should Scoring Rules be Effective , 1985 .

[23]  W. Newey,et al.  A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix , 1986 .

[24]  W. Newey,et al.  A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix , 1986 .

[25]  A. H. Murphy,et al.  A General Framework for Forecast Verification , 1987 .

[26]  W. Newey,et al.  Asymmetric Least Squares Estimation and Testing , 1987 .

[27]  A. H. Murphy,et al.  Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient , 1988 .

[28]  M. Schervish A General Method for Comparing Probability Assessors , 1989 .

[29]  Andrey Feuerverger,et al.  Some aspects of probability forecasting , 1992 .

[30]  A. H. Murphy,et al.  What Is a Good Forecast? An Essay on the Nature of Goodness in Weather Forecasting , 1993 .

[31]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[32]  M. C. Jones Expectiles and M-quantiles are quantiles , 1994 .

[33]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[34]  V. Koltchinskii M-estimation, convexity and quantiles , 1997 .

[35]  F P Wheeler,et al.  Bayesian Forecasting and Dynamic Models (2nd edn) , 1998, J. Oper. Res. Soc..

[36]  D. Richardson Skill and relative economic value of the ECMWF ensemble prediction system , 2000 .

[37]  D. Wilks A skill score based on economic value for probability forecasts , 2001 .

[38]  Kenneth R. Mylne,et al.  Decision‐making from probability forecasts based on forecast value , 2002 .

[39]  Ted Chang Geometrical foundations of asymptotic inference , 2002 .

[40]  M. Stehlík Distributions of exact tests in the exponential family , 2003 .

[41]  Martin Weber,et al.  On the ordering of probability forecasts , 2004 .

[42]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[43]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[44]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[45]  A. Timmermann Forecast Combinations , 2005 .

[46]  R. Koenker Quantile Regression: Name Index , 2005 .

[47]  James Mitchell,et al.  Evaluating, Comparing and Combining Density Forecasts Using the Klic with an Application to the Bank of England and Niesr 'Fan' Charts of Inflation , 2005 .

[48]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[49]  Allan Timmermann,et al.  Estimation and Testing of Forecast Rationality under Flexible Loss , 2005 .

[50]  Andrew J. Patton Volatility Forecast Comparison Using Imperfect Volatility Proxies , 2006 .

[51]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[52]  Eric M. Aldrich,et al.  Calibrated Probabilistic Forecasting at the Stateline Wind Energy Center , 2006 .

[53]  Stefania Tamea,et al.  Verification tools for probabilistic forecasts of continuous hydrological variables , 2006 .

[54]  Charles F. Manski,et al.  Comparing the Point Predictions and Subjective Probability Distributions of Professional Forecasters , 2006 .

[55]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[56]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[57]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[58]  A. Dawid The geometry of proper scoring rules , 2007 .

[59]  S. Hall,et al.  Combining density forecasts , 2007 .

[60]  R. Koenker,et al.  Regression Quantiles , 2007 .

[61]  Stochastic Orders , 2008 .

[62]  J. Geweke,et al.  Comparing and Evaluating Bayesian Predictive Distributions of Asset Returns , 2008 .

[63]  J. Brocker Reliability, Sufficiency, and the Decomposition of Proper Scores , 2008, 0806.0813.

[64]  J. Geweke,et al.  Optimal Prediction Pools , 2008 .

[65]  Dennis D. Cox,et al.  Pointwise testing with functional data using the Westfall–Young randomization method , 2008 .

[66]  T. Gneiting Making and Evaluating Point Forecasts , 2009, 0912.0902.

[67]  Oliver Linton,et al.  An Improved Bootstrap Test of Stochastic Dominance , 2009 .

[68]  José M. Angulo,et al.  Structural analysis of spatio‐temporal threshold exceedances , 2009 .

[69]  Jochen Bröcker,et al.  Reliability, sufficiency, and the decomposition of proper scores , 2009 .

[70]  Robert L. Winkler,et al.  Evaluating Quantile Assessments , 2009, Oper. Res..

[71]  Glenn D. Rudebusch,et al.  Forecasting Recessions: The Puzzle of the Enduring Power of the Yield Curve , 2007 .

[72]  Nick van de Giesen,et al.  Kullback–Leibler Divergence as a Forecast Skill Score with Classic Reliability–Resolution–Uncertainty Decomposition , 2010 .

[73]  Nicolas S. Lambert Elicitation and Evaluation of Statistical Forecasts , 2010 .

[74]  Barbara Rossi,et al.  Forecast comparisons in unstable environments , 2010 .

[75]  Francesco Ravazzolo,et al.  Combining inflation density forecasts , 2010 .

[76]  A. Raftery,et al.  Probabilistic Weather Forecasting for Winter Road Maintenance , 2010 .

[77]  T. Gneiting,et al.  Comparing Density Forecasts Using Threshold- and Quantile-Weighted Scoring Rules , 2011 .

[78]  I. Jolliffe,et al.  Forecast verification : a practitioner's guide in atmospheric science , 2011 .

[79]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[80]  T. Gneiting,et al.  Combining Predictive Distributions , 2011, 1106.1638.

[81]  Jonathan H. Wright,et al.  Forecasting Inflation , 2011 .

[82]  A. Allen Bradley,et al.  Summary Verification Measures and Their Interpretation for Ensemble Forecasts , 2011 .

[83]  Igor Vajda,et al.  On Bregman Distances and Divergences of Probability Measures , 2012, IEEE Transactions on Information Theory.

[84]  Piet Termonia,et al.  Added economic value of limited area multi-EPS weather forecasting applications , 2012 .

[85]  Patrick Gallinari,et al.  "On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking" , 2012, NIPS.

[86]  Norman R. Swanson,et al.  A Survey of Recent Advances in Forecast Accuracy Comparison Testing, With an Extension to Stochastic Dominance , 2012 .

[87]  Vu,et al.  Time-Varying Combinations of Predictive Densities Using Nonlinear Filtering , 2012 .

[88]  David S. Richardson,et al.  Economic Value and Skill , 2012 .

[89]  Milan Stehlík,et al.  Decompositions of information divergences: Recent development, open problems and applications , 2012 .

[90]  Jorge Mateu,et al.  Spatial threshold exceedance analysis through marked point processes , 2012 .

[91]  C. Scott Calibrated asymmetric surrogate losses , 2012 .

[92]  Michael I. Jordan,et al.  The Asymptotics of Ranking Algorithms , 2012, ArXiv.

[93]  Robert P. Lieli,et al.  Closing the Gap between Risk Estimation and Decision Making: Efficient Management of Trade-Related Invasive Species Risk , 2013, Review of Economics and Statistics.

[94]  Pierre Pinson,et al.  Global Energy Forecasting Competition 2012 , 2014 .

[95]  Patrick Gallinari,et al.  Calibration and regret bounds for order-preserving surrogate losses in learning to rank , 2013, Machine Learning.

[96]  Johanna F. Ziegel,et al.  COHERENCE AND ELICITABILITY , 2013, 1303.1690.

[97]  Mark Steyvers,et al.  Choosing a Strictly Proper Scoring Rule , 2013, Decis. Anal..

[98]  Siyu Zhang,et al.  Elicitation and Identification of Properties , 2014, COLT.

[99]  A. Dawid,et al.  Minimum Scoring Rule Inference , 2014, 1403.3920.

[100]  Petra Friederichs,et al.  Decomposition and graphical portrayal of the quantile score , 2014 .

[101]  A. Müller,et al.  Generalized Quantiles as Risk Measures , 2013 .

[102]  Ian A. Kash,et al.  General Truthfulness Characterizations Via Convex Analysis , 2012, WINE.

[103]  Linda Schulze Waltrup,et al.  Expectile and Quantile Regression , 2014 .

[104]  Francis X. Diebold,et al.  Assessing point forecast accuracy by stochastic error distance , 2014 .

[105]  Aditya Krishna Menon,et al.  Bayes-Optimal Scorers for Bipartite Ranking , 2014, COLT.

[106]  Alexander Tsyplakov,et al.  Theoretical guidelines for a partially informed forecast examiner , 2014 .

[107]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[108]  Jan Beran,et al.  The harmonic moment tail index estimator: asymptotic distribution and robustness , 2014 .

[109]  Hajo Holzmann,et al.  The role of the information set for forecasting—with applications to risk management , 2014, 1404.7653.

[110]  Tae-Hwy Lee,et al.  Nonparametric and Semiparametric Regressions Subject to Monotonicity Constraints: Estimation and Forecasting , 2014 .

[111]  P. Embrechts,et al.  An Academic Response to Basel 3.5 , 2014 .

[112]  Fabio Bellini,et al.  Elicitable Risk Measures , 2014 .

[113]  Linda Schulze Waltrup,et al.  Expectile and quantile regression—David and Goliath? , 2015 .

[114]  M. Stehlík,et al.  Entropy based statistical inference for methane emissions released from wetland , 2015 .

[115]  Ian A. Kash,et al.  On Elicitation Complexity , 2015, NIPS.

[116]  Johanna F. Ziegel,et al.  Cross-calibration of probabilistic forecasts , 2015 .

[117]  Francesco Ravazzolo,et al.  Forecaster's Dilemma: Extreme Events and Forecast Evaluation , 2015, 1512.09244.

[118]  Philip E. Tetlock,et al.  Superforecasting: The Art and Science of Prediction , 2015 .

[119]  George Kapetanios,et al.  Generalised Density Forecast Combinations , 2014 .

[120]  Tilmann Gneiting,et al.  Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtesting , 2015, 1507.00244.

[121]  Arpit Agarwal,et al.  On Consistent Surrogate Risk Minimization and Property Elicitation , 2015, COLT.

[122]  Peter F. CHRISTOFFERSENti EVALUATING INTERVAL FORECASTS , 2016 .

[123]  Shivani Agarwal,et al.  Convex Calibration Dimension for Multiclass Loss Matrices , 2014, J. Mach. Learn. Res..

[124]  Johanna F. Ziegel,et al.  Higher order elicitability and Osband’s principle , 2015, 1503.08123.

[125]  Ute Beyer,et al.  Bayesian Forecasting And Dynamic Models , 2016 .

[126]  Valeria Bignozzi,et al.  Risk measures with the CxLS property , 2014, Finance Stochastics.

[127]  G. Elliott,et al.  Forecasting Conditional Probabilities of Binary Outcomes under Misspecification , 2016, Review of Economics and Statistics.

[128]  Debashis Paul,et al.  Zero Expectile Processes and Bayesian Spatial Regression , 2016 .

[129]  Norman R. Swanson,et al.  ROBUST FORECAST COMPARISON , 2016, Econometric Theory.

[130]  P. Marriott,et al.  Information Geometry and Its Applications: An Overview , 2017 .

[131]  Fabio Bellini,et al.  Risk management with expectiles , 2014 .