Measuring forecast performance in the presence of observation error

A new framework is introduced for measuring the performance of probability forecasts when the true value of the predictand is observed with error. In these circumstances, proper scoring rules favour good forecasts of observations rather than of truth and yield scores that vary with the quality of the observations. Proper scoring rules thus can favour forecasters who issue worse forecasts of the truth and can mask real changes in forecast performance if observation quality varies over time. Existing approaches to accounting for observation error provide unsatisfactory solutions to these two problems. A new class of ‘error-corrected’ proper scoring rules is defined that solves both problems by producing unbiased estimates of the scores that would be obtained if the forecasts could be verified against the truth. A general method for constructing error-corrected proper scoring rules is given for the case of categorical predictands, and error-corrected versions of the Dawid-Sebastiani scoring rule are proposed for numerical predictands. The benefits of accounting for observation error in ensemble post-processing and in forecast verification are illustrated in three data examples that include forecasts for the occurrence of tornadoes and of aircraft icing.

[1]  A. Raftery,et al.  Probabilistic forecasts, calibration and sharpness , 2007 .

[2]  David Ruppert,et al.  Incorporating misclassification error in skill assessment , 2005 .

[3]  Anton H. Westveld,et al.  Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation , 2005 .

[4]  Nadine Gissibl,et al.  Using Proper Divergence Functions to Evaluate Climate Models , 2013, SIAM/ASA J. Uncertain. Quantification.

[5]  David S. Richardson,et al.  Effects of observation errors on the statistics for ensemble spread and reliability , 2004 .

[6]  Peter S. Ray,et al.  An Improved Estimate of Tornado Occurrence in the Central Plains of the United States , 2003 .

[7]  David B. Stephenson,et al.  Inherent Bounds on Forecast Accuracy due to Observation Uncertainty Caused by Temporal Sampling , 2015 .

[8]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[9]  Neill E. Bowler Accounting for the effect of observation errors on verification of MOGREPS , 2008 .

[10]  Jochen Bröcker Erratum to: Estimating reliability and resolution of probability forecasts through decomposition of the empirical score , 2012, Climate Dynamics.

[11]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[12]  Pierre Pinson,et al.  Verification of the ECMWF ensemble forecasts of wind speed against analyses and observations , 2012 .

[13]  Laura E. Michaels,et al.  The Decreasing Population Bias in Tornado Reports across the Central Plains , 2013 .

[14]  Andreas Hense,et al.  Bayesian Model Verification of NWP Ensemble Forecasts , 2013 .

[15]  Manfred Dorninger,et al.  Quantifying verification uncertainty by reference data variation , 2012 .

[16]  Anna Ghelli,et al.  Observational probability method to assess ensemble precipitation forecasts , 2012 .

[17]  Neill E. Bowler,et al.  Explicitly Accounting for Observation Error in Categorical Verification of Forecasts , 2006 .

[18]  O. Talagrand,et al.  Impact of observational error on the validation of ensemble prediction systems , 2008 .

[19]  T. Thorarinsdottir,et al.  Forecast verification for extreme value distributions with an application to probabilistic peak wind prediction , 2012, 1204.1022.

[20]  A. H. Murphy,et al.  Scoring rules and the evaluation of probabilities , 1996 .

[21]  R. Buizza,et al.  The Skill of Probabilistic Precipitation Forecasts under Observational Uncertainties within the Generalized Likelihood Uncertainty Estimation Framework for Hydrological Applications , 2009 .

[22]  Jeffrey L. Anderson A Method for Producing and Evaluating Probabilistic Forecasts from Ensemble Model Integrations , 1996 .

[23]  Leonard A. Smith,et al.  Scoring Probabilistic Forecasts: The Importance of Being Proper , 2007 .

[24]  J. Andrew Royle,et al.  Population Influences on Tornado Reports in the United States , 2007 .

[25]  Paola Sebastiani,et al.  Coherent dispersion criteria for optimal experimental design , 1999 .

[26]  Richard L. Thompson,et al.  FORECASTER’S FORUM: Subjective Tornado Probability Forecasts in Severe Weather Watches , 2001 .

[27]  P. L. Houtekamer,et al.  Verification of an Ensemble Prediction System against Observations , 2007 .

[28]  C. Piccolo,et al.  Verification against perturbed analyses and observations , 2015 .

[29]  S. Weijs,et al.  Accounting for Observational Uncertainty in Forecast Verification: An Information-Theoretical View on Forecasts, Observations, and Truth , 2011 .

[30]  T. Hamill Interpretation of Rank Histograms for Verifying Ensemble Forecasts , 2001 .

[31]  W. Krajewski,et al.  On the estimation of radar rainfall error variance , 1999 .

[32]  Christopher A. T. Ferro,et al.  Fair scores for ensemble forecasts , 2014 .