Sample Stratification in Verification of Ensemble Forecasts of Continuous Scalar Variables: Potential Benefits and Pitfalls

AbstractIn the verification field, stratification is the process of dividing the sample of forecast–observation pairs into quasi-homogeneous subsets, in order to learn more on how forecasts behave under specific conditions. A general framework for stratification is presented for the case of ensemble forecasts of continuous scalar variables. Distinction is made between forecast-based, observation-based, and external-based stratification, depending on the criterion on which the sample is stratified. The formalism is applied to two widely used verification measures: the continuous ranked probability score (CRPS) and the rank histogram. For both, new graphical representations that synthesize the added information are proposed. Based on the definition of calibration, it is shown that the rank histogram should be used within a forecast-based stratification, while an observation-based stratification leads to significantly nonflat histograms for calibrated forecasts. Nevertheless, as previous studies have warned,...

[1]  A. H. Murphy,et al.  A General Framework for Forecast Verification , 1987 .

[2]  I. Zin,et al.  Probabilistic flood forecasting on the Rhone River: evaluation with ensemble and analogue-based precipitation forecasts , 2016 .

[3]  R. Marty,et al.  Toward Real-Time Daily PQPF by an Analog Sorting Approach: Application to Flash-Flood Catchments , 2012 .

[4]  Thomas M. Hamill,et al.  Measuring forecast skill: is it real skill or is it the varying climatology? , 2006 .

[5]  T. Gneiting,et al.  Comparing Density Forecasts Using Threshold- and Quantile-Weighted Scoring Rules , 2011 .

[6]  C. Obled,et al.  Quantitative precipitation forecasts: a statistical adaptation of model outputs through an analogues sorting approach , 2002 .

[7]  P. Yiou,et al.  Weather regimes designed for local precipitation modeling: Application to the Mediterranean basin , 2010 .

[8]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[9]  Ian T. Jolliffe,et al.  Evaluating Rank Histograms Using Decompositions of the Chi-Square Test Statistic , 2008 .

[10]  T. Hamill,et al.  Evaluation of Eta-RSM Ensemble Probabilistic Precipitation Forecasts , 1998 .

[11]  Roberto Buizza,et al.  TIGGE: Preliminary results on comparing and combining ensembles , 2008 .

[12]  A. H. Murphy,et al.  Verification of Probabilistic Predictions: A Brief Review , 1967 .

[13]  Thomas M. Hamill,et al.  Verification of Eta–RSM Short-Range Ensemble Forecasts , 1997 .

[14]  J. Salas,et al.  A COMPARATIVE ANALYSIS OF TECHNIQUES FOR SPATIAL INTERPOLATION OF PRECIPITATION , 1985 .

[15]  R. L. Winkler,et al.  Scoring Rules for Continuous Probability Distributions , 1976 .

[16]  Francesco Ravazzolo,et al.  Forecaster's Dilemma: Extreme Events and Forecast Evaluation , 2015, 1512.09244.

[17]  Roberto Buizza,et al.  The Impact of Horizontal Resolution and Ensemble Size on Probabilistic Forecasts of Precipitation by the ECMWF Ensemble Prediction System , 2002 .

[18]  Jochen Bröcker,et al.  On reliability analysis of multi-categorical forecasts , 2008 .

[19]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[20]  Michel Lang,et al.  Daily quantitative precipitation forecasts based on the analogue method: Improvements and application to a French large river basin , 2016 .

[21]  T. Palmer,et al.  Stochastic representation of model uncertainties in the ECMWF ensemble prediction system , 2007 .

[22]  Thomas M. Hamill,et al.  Probabilistic Quantitative Precipitation Forecasts Based on Reforecast Analogs: Theory and Application , 2006 .

[23]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[24]  Kimberly L. Elmore,et al.  Alternatives to the Chi-Square Test for Evaluating Rank Histograms from Ensemble Forecasts , 2005 .

[25]  P. L. Houtekamer,et al.  Verification of an Ensemble Prediction System against Observations , 2007 .

[26]  Jeffrey L. Anderson A Method for Producing and Evaluating Probabilistic Forecasts from Ensemble Model Integrations , 1996 .

[27]  Barbara G. Brown,et al.  Forecast verification: current status and future directions , 2008 .

[28]  O. Talagrand,et al.  Evaluation of probabilistic prediction systems for a scalar variable , 2005 .

[29]  J. Schaake,et al.  Precipitation and temperature ensemble forecasts from single-value forecasts , 2007 .

[30]  Michel Lang,et al.  Precipitation forecasting through an analog sorting technique: a comparative study , 2010 .

[31]  Allan H. Murphy A Coherent Method of Stratification within a General Framework for Forecast Verification , 1995 .

[32]  Sidney Teweles,et al.  Verification of Prognostic Charts , 1954 .

[33]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[34]  Stefan Siegert,et al.  Rank Histograms of Stratified Monte Carlo Ensembles , 2012 .

[35]  R. Vautard,et al.  Weather Regimes: Recurrence and Quasi Stationarity , 1995 .

[36]  Nadine Gissibl,et al.  Using Proper Divergence Functions to Evaluate Climate Models , 2013, SIAM/ASA J. Uncertain. Quantification.

[37]  T. Hamill Interpretation of Rank Histograms for Verifying Ensemble Forecasts , 2001 .

[38]  G. Brier,et al.  External correspondence: Decompositions of the mean probability score , 1982 .

[39]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[40]  H. Hersbach Decomposition of the Continuous Ranked Probability Score for Ensemble Prediction Systems , 2000 .