Towards improving the framework for probabilistic forecast evaluation

The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good’s logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average “distance” between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.

[1]  D. Randall,et al.  Climate models and their evaluation , 2007 .

[2]  R. Selten Axiomatic Characterization of the Quadratic Scoring Rule , 1998 .

[3]  Keith Beven,et al.  A manifesto for the equifinality thesis , 2006 .

[4]  Allan H. Murphy,et al.  The Family of Quadratic Scoring Rules , 1978 .

[5]  Leonard A. Smith,et al.  Scoring Probabilistic Forecasts: The Importance of Being Proper , 2007 .

[6]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[7]  J. Bröcker Evaluating raw ensembles with the continuous ranked probability score , 2012 .

[8]  Leonard A. Smith,et al.  Parameter estimation through ignorance. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  M. Roulston,et al.  Evaluating Probabilistic Forecasts Using Information Theory , 2002 .

[10]  Andreas P. Weigel,et al.  A Generic Forecast Verification Framework for Administrative Purposes , 2009 .

[11]  Edward S. Epstein,et al.  A Scoring System for Probability Forecasts of Ranked Categories , 1969 .

[12]  H. Storch,et al.  Statistical Analysis in Climate Research , 2000 .

[13]  Leonard A. Smith,et al.  From ensemble forecasts to predictive distribution functions , 2008 .

[14]  Thomas E. Fricker,et al.  A verification framework for interannual-to-decadal predictions experiments , 2012, Climate Dynamics.

[15]  Leonard A. Smith,et al.  Probabilistic skill in ensemble seasonal forecasts , 2015 .

[16]  J. Landes,et al.  Strictly Proper Scoring Rules , 2014 .

[17]  Leonard A. Smith,et al.  Evaluating Probabilistic Forecasts Using Information Theory , 2002 .

[18]  Jochen Bröcker,et al.  Reliability, sufficiency, and the decomposition of proper scores , 2009 .

[19]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[20]  Daniel S. Wilks,et al.  Sampling distributions of the Brier score and Brier skill score under serial dependence , 2010 .

[21]  Leonard A. Smith Local optimal prediction: exploiting strangeness and the variation of sensitivity to initial condition , 1994, Philosophical Transactions of the Royal Society of London. Series A: Physical and Engineering Sciences.

[22]  Alex Jarman On the provision, reliability, and use of hurricane forecasts on various timescales , 2014 .

[23]  H. Hersbach Decomposition of the Continuous Ranked Probability Score for Ensemble Prediction Systems , 2000 .

[24]  Karl E. Taylor,et al.  An overview of CMIP5 and the experiment design , 2012 .

[25]  Leonard A. Smith The maintenance of uncertainty , 1997 .

[26]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[27]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[28]  John F. B. Mitchell,et al.  The simulation of SST, sea ice extents and ocean heat transports in a version of the Hadley Centre coupled model without flux adjustments , 2000 .

[29]  Leonard A. Smith,et al.  An Evaluation of Decadal Probability Forecasts from State-of-the-Art Climate Models* , 2013 .

[30]  Pertti Nurmi,et al.  Recommendations on the verification of local weather forecasts , 2003 .

[31]  Società italiana di fisica,et al.  Past and present variability of the solar-terrestrial system : measurement, data analysis and theoretical models : proceedings of the International School of Physics "Enrico Fermi" : course CXXXIII, Varenna on Lake Como, Villa Monastero, 23 June - 5 July 1996 , 1997 .

[32]  Émile Borel,et al.  Probabilities and Life , 1962 .

[33]  S. Solomon The Physical Science Basis : Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change , 2007 .

[34]  Renate Hagedorn,et al.  Communicating the value of probabilistic forecasts with weather roulette , 2009 .

[35]  D. Friedman Effective Scoring Rules for Probabilistic Forecasts , 1983 .

[36]  D Sornette,et al.  Statistical methods of parameter estimation for deterministically chaotic time series. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[38]  W. Briggs Statistical Methods in the Atmospheric Sciences , 2007 .

[39]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[40]  I. Jolliffe,et al.  Forecast verification : a practitioner's guide in atmospheric science , 2011 .

[41]  David S. Richardson,et al.  On the effect of ensemble size on the discrete and continuous ranked probability scores , 2008 .

[42]  P. Jones,et al.  Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850 , 2006 .

[43]  David B. Stephenson,et al.  Three recommendations for evaluating climate predictions , 2013 .