The Effects of Imputing Missing Data on Ensemble Temperature Forecasts

A major issue for developing post-processing methods for NWP forecasting systems is the need to obtain complete training datasets. Without a complete dataset, it can become difficult, if not impossible, to train and verify statistical post-processing techniques, including ensemble consensus forecasting schemes. In addition, when ensemble forecast data are missing, the real-time use of the consensus forecast weighting scheme becomes difficult and the quality of uncertainty information derived from the ensemble is reduced. To ameliorate these problems, an analysis of the treatment of missing data in ensemble model temperature forecasts is performed to determine which method of replacing the missing data produces the lowest Mean Absolute Error (MAE) of consensus forecasts while preserving the ensemble calibration. This study explores several methods of replacing missing data, including ones based on persistence, a Fourier fit to capture seasonal variability, ensemble member mean substitution, three day mean deviation, and an Artificial Neural Network (ANN). The analysis is performed on 48-hour temperature forecasts for ten locations in the Pacific Northwest. The methods are evaluated according to their effect on the forecast performance of two ensemble post-processing forecasting methods, specifically an equal-weight consensus forecast and a ten day performance-weighted window. The methods are also assessed using rank histograms to determine if they preserve the calibration of the ensembles. For both post- processing techniques all imputation methods, with the exception of the ensemble mean substitution, produce mean absolute errors not significantly different from the cases when all ensemble members are available. However, the three day mean deviation and ANN have rank histograms similar to that for the baseline of the non-imputed cases (i.e. the ensembles are appropriately calibrated) for all locations, while persistence, ensemble mean, and Fourier substitution do not consistently produce appropriately calibrated ensembles. The three day mean deviation has the advantage of being computationally efficient in a real-time forecasting environment.

[1]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[2]  H. Glahn,et al.  The Use of Model Output Statistics (MOS) in Objective Weather Forecasting , 1972 .

[3]  Clifford F. Mass,et al.  Aspects of Effective Mesoscale, Short-Range Ensemble Forecasting , 2005 .

[4]  A. Raftery,et al.  Using Bayesian Model Averaging to Calibrate Forecast Ensembles , 2005 .

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[7]  Tilmann Gneiting,et al.  Calibrating Multimodel Forecast Ensembles with Exchangeable and Missing Members Using Bayesian Model Averaging , 2010 .

[8]  E. Grimit,et al.  Initial Results of a Mesoscale Short-Range Ensemble Forecasting System over the Pacific Northwest , 2002 .

[9]  Chermelle Engel,et al.  Operational Consensus Forecasts , 2005 .

[10]  Kimberly L. Elmore,et al.  Alternatives to the Chi-Square Test for Evaluating Rank Histograms from Ensemble Forecasts , 2005 .

[11]  Lucie A. Vincent,et al.  Canadian historical and homogeneous temperature datasets for climate change analyses , 1999 .

[12]  George S. Young,et al.  Implementing a Neural Network Emulation of a Satellite Retrieval Algorithm , 2009 .

[13]  W. Briggs Statistical Methods in the Atmospheric Sciences , 2007 .

[14]  Ian T. Jolliffe,et al.  Evaluating Rank Histograms Using Decompositions of the Chi-Square Test Statistic , 2008 .

[15]  David S. Richardson,et al.  Effects of observation errors on the statistics for ensemble spread and reliability , 2004 .

[16]  Sue Ellen Haupt,et al.  The Regime Dependence of Optimally Weighted Ensemble Model Consensus Forecasts of Surface Temperature , 2007 .

[17]  Kevin E. Trenberth,et al.  Effects of Missing Data on Estimates of Monthly Mean General Circulation Statistics. , 1988 .

[18]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[19]  Chris Snyder,et al.  Ensemble Forecasting in the Short to Medium Range: Report from a Workshop , 2000 .

[20]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[21]  Theodore B. Trafalis,et al.  Missing Data Imputation Through Machine Learning Algorithms , 2009 .

[22]  松山 洋 「Statistical Methods in the Atmospheric Sciences(2nd edition), International Geophysics Series 91」, Daniel S. Wilks著, Academic Press, 2005年11月, 648頁, $94.95, ISBN978-0-12-751966-1(本だな) , 2010 .

[23]  Vladimir M. Krasnopolsky,et al.  Neural Network Applications to Solve Forward and Inverse Problems in Atmospheric and Oceanic Satellite Remote Sensing , 2009 .

[24]  T. Hamill Interpretation of Rank Histograms for Verifying Ensemble Forecasts , 2001 .

[25]  Jeffrey L. Anderson A Method for Producing and Evaluating Probabilistic Forecasts from Ensemble Model Integrations , 1996 .

[26]  Alan J. Thomson,et al.  Estimating Missing Daily Maximum and Minimum Temperatures , 1983 .

[27]  C. Marzban,et al.  On the Effect of Correlations on Rank Histograms: Reliability of Temperature and Wind Speed Forecasts from Finescale Ensemble Reforecasts , 2011 .