Probabilistic Forecasts , Calibration and Sharpness JRSSB Submission B 6257 Revision 1

Probabilistic forecasts of a continuous variable take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework phrased in terms of a game between nature and forecaster allows us to distinguish probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform (PIT) histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy center in the U.S. Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.

[1]  M. Rosenblatt Remarks on a Multivariate Transformation , 1952 .

[2]  Eric M. Aldrich,et al.  Calibrated Probabilistic Forecasting at the Stateline Wind Energy Center , 2006 .

[3]  Jeremy Berkowitz Testing Density Forecasts, With Applications to Risk Management , 2001 .

[4]  N. Shephard Partial non-Gaussian state space , 1994 .

[5]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[6]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[7]  D. L. Hanson,et al.  On the strong law of large numbers for a class of stochastic processes , 1963 .

[8]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[9]  M Schumacher,et al.  How to Assess Prognostic Models for Survival Data: A Case Study in Oncology , 2003, Methods of Information in Medicine.

[10]  A. H. Murphy,et al.  Time Series Models to Simulate and Forecast Wind Speed and Wind Power , 1984 .

[11]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[12]  D. Duffie,et al.  An Overview of Value at Risk , 1997 .

[13]  Tim N. Palmer,et al.  The economic value of ensemble forecasts as a tool for risk assessment: From days to decades , 2002 .

[14]  Anthony Garratt,et al.  Forecast Uncertainties in Macroeconomic Modeling , 2003 .

[15]  Michael P. Clements,et al.  Evaluating The Forecast of Densities of Linear and Non-Linear Models: Applications to Output Growth and Unemployment , 2000 .

[16]  John Bjørnar Bremnes,et al.  Probabilistic Forecasts of Precipitation in Terms of Quantiles Using NWP Model Output , 2004 .

[17]  D. Madigan,et al.  Bayesian Model Averaging for Linear Regression Models , 1997 .

[18]  Kenneth F. Wallis,et al.  Chi-Squared Tests of Interval and Density Forecasts, and the Bank of England's Fan Charts , 2001, SSRN Electronic Journal.

[19]  Alvaro Sandroni,et al.  Calibration with Many Checking Rules , 2003, Math. Oper. Res..

[20]  Norman R. Swanson,et al.  Predictive Density Evaluation , 2005 .

[21]  F. Seillier-Moiseiwitsch Sequential Probability Forecasts and the Probability Integral Transform , 1993 .

[22]  A. H. Murphy,et al.  Diagnostic Verification of Temperature Forecasts , 1989 .

[23]  Robert L. Winkler Rewarding Expertise in Probability Assessment , 1977 .

[24]  M. Degroot,et al.  Assessing Probability Assessors: Calibration and Refinement. , 1981 .

[25]  M. Schervish A General Method for Comparing Probability Assessors , 1989 .

[26]  R. Selten Axiomatic Characterization of the Quadratic Scoring Rule , 1998 .

[27]  Ecmwf Newsletter,et al.  EUROPEAN CENTRE FOR MEDIUM-RANGE WEATHER FORECASTS , 2004 .

[28]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[29]  A. H. Murphy,et al.  A General Framework for Forecast Verification , 1987 .

[30]  Sean D. Campbell,et al.  Weather Forecasting for Weather Derivatives , 2002 .

[31]  Anthony S. Tay,et al.  Evaluating Density Forecasts with Applications to Financial Risk Management , 1998 .

[32]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[33]  Jeffrey L. Anderson A Method for Producing and Evaluating Probabilistic Forecasts from Ensemble Model Integrations , 1996 .

[34]  C. Czado,et al.  Spatial modelling of claim frequency and claim size in insurance , 2005 .

[35]  Roman Krzysztofowicz,et al.  Bayesian theory of probabilistic forecasting via deterministic hydrologic model , 1999 .

[36]  A. H. Murphy,et al.  Scalar and Vector Partitions of the Probability Score: Part I. Two-State Situation , 1972 .

[37]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[38]  Robert Goodall Brown,et al.  Decision Making and Change in Human Affairs , 1979 .

[39]  J. Bernardo Expected Information as Expected Utility , 1979 .

[40]  Jim Q. Smith,et al.  Diagnostic checks of non‐standard time series models , 1985 .

[41]  Anthony Garratt,et al.  Forecast Uncertainties in Macroeconometric Modelling: An Application to the UK Economy , 2000, SSRN Electronic Journal.

[42]  Luc Bauwens,et al.  A Comparison of Financial Duration Models Via Density Forecast , 2004 .

[43]  Andreas S. Weigend,et al.  Predicting Daily Probability Distributions of S&P500 Returns , 1998 .

[44]  Xiao-Li Meng,et al.  Posterior Predictive Assessment of Model Fitnessvia Realized , 1995 .

[45]  Leonard A. Smith,et al.  Combining dynamical and statistical ensembles , 2003 .

[46]  A. H. Murphy,et al.  Diagnostic verification of probability forecasts , 1992 .

[47]  Clive W. J. Granger Preface: Some Thoughts on the Future of Forecasting , 2005 .

[48]  Sylvia Früiiwirth-Schnatter,et al.  Recursive residuals and model diagnostics for normal and non-normal state space models , 1996, Environmental and Ecological Statistics.

[49]  A. Papritz,et al.  An Empirical Comparison of Kriging Methods for Nonlinear Spatial Point Prediction , 2002 .

[50]  Sarah Brocklehurst,et al.  Techniques for prediction analysis and recalibration , 1996 .

[51]  Thomas M. Hamill,et al.  Verification of Eta–RSM Short-Range Ensemble Forecasts , 1997 .

[52]  Adrian E. Raftery,et al.  Weather Forecasting with Ensemble Methods , 2005, Science.

[53]  Stewart D. Hodges,et al.  An evaluation of tests of distributional forecasts , 2003 .

[54]  T. Hamill Interpretation of Rank Histograms for Verifying Ensemble Forecasts , 2001 .

[55]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[56]  G. Blattenberger,et al.  Separating the Brier Score into Calibration and Refinement Components: A Graphical Exposition , 1985 .

[57]  Roman Krzysztofowicz,et al.  Calibration of Probabilistic Quantitative Precipitation Forecasts , 1999 .

[58]  S. Holstein,et al.  Assessment and evaluation of subjective probability distributions , 1970 .

[59]  David Oakes,et al.  Self-Calibrating Priors Do Not Exist , 1985 .

[60]  Emanuela Marrocu,et al.  THE PERFORMANCE OF SETAR MODELS: A REGIME CONDITIONAL EVALUATION OF POINT, INTERVAL AND DENSITY FORECASTS , 2004 .

[61]  A. Raftery,et al.  Using Bayesian Model Averaging to Calibrate Forecast Ensembles , 2005 .

[62]  mith,et al.  Evaluating Probabilistic Forecasts Using Information Theory , 2002 .

[63]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[64]  O. Talagrand,et al.  Evaluation of probabilistic prediction systems for a scalar variable , 2005 .

[65]  G. Shafer,et al.  Good randomized sequential probability forecasting is always possible , 2005 .

[66]  K. Pearson ON A METHOD OF DETERMINING WHETHER A SAMPLE OF SIZE n SUPPOSED TO HAVE BEEN DRAWN FROM A PARENT POPULATION HAVING A KNOWN PROBABILITY INTEGRAL HAS PROBABLY BEEN DRAWN AT RANDOM , 1933 .

[67]  Anton H. Westveld,et al.  Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation , 2005 .