A comparison of data imputation methods using Bayesian compressive sensing and Empirical Mode Decomposition for environmental temperature data

We present two Bayesian compressive sensing (BCS) imputation methods, BCS-on-Signal and BCS-on-IMF, and compare to temporal and spatio-temporal methods. We build sparse BCS models using available data, then use this sparse model for imputation. Most BCS applications have the sparse data distributed across the computational space, in our adaptation the sparse data are outside the reconstruction space. We used 30 years of temperature data and created gaps of 1% (110 days), 5% (1.5 years), 10% (3 years), and 20% (6 years). Performance was not sensitive to gap size with RMSE slightly above 6C for the BCS-on-Signal and Temporal models, the two best methods. The methods which only required data from the target station performed as well as, or better than, the spatio-temporal model which requires data from surrounding stations. Visually the BCS-on-IMF results seem to better represent longer-period random temporal fluctuations while having poorer performance metrics. Display Omitted Evaluated 4 models for imputation of cyclic environmental data.Sparsity-based methods, such as Bayesian Compressive Sensing (BCS) are applicable for data imputation.BCS using only data from the target station performed as well as models requiring data from nearby stations.Goodness-of-fit metrics were similar for gap sizes from 1 to 6 years.There was no best model, however one BCS model was 1st or 2nd in all cases; another had the most visually realistic results.

[1]  Quan J. Wang,et al.  A Bayesian method for multi-site stochastic data generation: Dealing with non-concurrent and missing data, variable transformation and parameter uncertainty , 2008, Environ. Model. Softw..

[2]  Zhaohua Wu ENSEMBLE EMPIRICAL MODE DECOMPOSITION AND ITS MULTI-DIMENSIONAL EXTENSIONS , 2014 .

[3]  G. La Loggia,et al.  Comparative analysis of different techniques for spatial interpolation of rainfall data to create a serially complete monthly time series of precipitation for Sicily, Italy , 2011, Int. J. Appl. Earth Obs. Geoinformation.

[4]  B. Sansó,et al.  Spatially varying temperature trends in a Central California Estuary , 2007 .

[5]  A. Jayawardena,et al.  Analysis and prediction of chaos in rainfall and stream flow time series , 1994 .

[6]  H. Stanley,et al.  Multifractal Detrended Fluctuation Analysis of Nonstationary Time Series , 2002, physics/0202070.

[7]  William E. Eichinger,et al.  Empirical Mode Decomposition applied to solar irradiance, global temperature, sunspot number, and CO2 concentration data , 2011 .

[8]  E. J. Gilroy Reliability of a Variance Estimate Obtained from a Sample Augmented by Multivariate Regression , 1970 .

[9]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[10]  Steven M. Thornberg,et al.  SEDIDAT: a BASIC program for the collection and staXtistical analysis of partice settling velocity data , 1988 .

[11]  Amaury Lendasse,et al.  An improved methodology for filling missing values in spatiotemporal climate data set , 2010 .

[12]  Myron B. Fiering,et al.  ON THE USE OF CORRELATION TO AUGMENT DATA , 1962 .

[13]  Slobodan P. Simonovic,et al.  Estimation of missing streamflow data using principles of chaos theory , 2002 .

[14]  Shlomo Havlin,et al.  Multifractality of river runoff and precipitation: Comparison of fluctuation analysis and wavelet methods , 2003 .

[15]  Mac McKee,et al.  Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique , 2007 .

[16]  J. Stedinger,et al.  Minimum variance streamflow record augmentation procedures , 1985 .

[17]  Yonina C. Eldar,et al.  Introduction to Compressed Sensing , 2022 .

[18]  Hugo Van hamme,et al.  Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[19]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[20]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[21]  Jon D. Pelletier,et al.  Long-range persistence in climatological and hydrological time series: analysis, modeling and application to drought hazard assessment , 1997 .

[22]  Peter Guttorp,et al.  Space‐Time Modelling of Trends in Temperature Series , 2011 .

[23]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[24]  George Kuczera,et al.  On maximum likelihood estimators for the multisite lag-one streamflow model: Complete and incomplete data cases , 1987 .

[25]  Niklas Linde,et al.  Feature-preserving interpolation and filtering of environmental time series , 2015, Environ. Model. Softw..

[26]  P. Gyau-Boakye,et al.  Filling gaps in runoff time series in West Africa , 1994 .

[27]  W. C. Lennox,et al.  Groups and neural networks based streamflow data infilling procedures , 2001 .

[28]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[29]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[30]  S. Mohan,et al.  Models for extending streamflow data : a case study , 1995 .

[31]  Slobodan P. Simonovic,et al.  Group-based estimation of missing hydrological data: I. Approach and general methodology , 2000 .

[32]  J. Lundquist,et al.  A Comparison of Methods for Filling Gaps in Hourly Near-Surface Air Temperature Data , 2013 .

[33]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[34]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[35]  S. F. Railsback,et al.  Comparison of regression and time-series methods for synthesizing missing streamflow records , 1989 .

[36]  Peter C. Young,et al.  A recursive estimation approach to the spatio-temporal analysis and modelling of air quality data , 2006, Environ. Model. Softw..

[37]  Bert Cranen,et al.  Using sparse representations for missing data imputation in noise robust speech recognition , 2008, 2008 16th European Signal Processing Conference.

[38]  Norden E. Huang,et al.  A review on Hilbert‐Huang transform: Method and its applications to geophysical studies , 2008 .

[39]  Benoît Otjacques,et al.  A user-driven case-based reasoning tool for infilling missing values in daily mean river flow records , 2016, Environ. Model. Softw..

[40]  Roberto Serrano-Notivoli,et al.  An R package for daily precipitation climate series reconstruction , 2017, Environ. Model. Softw..

[41]  R. Hirsch A Comparison of Four Streamflow Record Extension Techniques , 1982 .

[42]  On Estimators Obtained From a Sample Augmented by Multiple Regression , 1974 .

[43]  Fred Espen Benth,et al.  A Spatial-temporal Model for Temperature with Seasonal Variance , 2007 .

[44]  Leo R Beard,et al.  Statistical Methods in Hydrology , 1962 .

[45]  John O. Carter,et al.  Using spatial interpolation to construct a comprehensive archive of Australian climate data , 2001, Environ. Model. Softw..

[46]  Paulin Coulibaly,et al.  Comparison of neural network methods for infilling missing daily weather records , 2007 .

[47]  Kuolin Hsu,et al.  Artificial Neural Network Modeling of the Rainfall‐Runoff Process , 1995 .

[48]  Maria J. Diamantopoulou Filling gaps in diameter measurements on standing tree boles in the urban forest of Thessaloniki, Greece , 2010, Environ. Model. Softw..

[49]  L. M. Berliner,et al.  Dimension-Reduced Modeling of Spatio-Temporal Processes , 2014 .

[50]  Long period oscillations in sunspots , 2010 .

[51]  Philippe Renard,et al.  Missing data simulation inside flow rate time-series using multiple-point statistics , 2016, Environ. Model. Softw..

[52]  Michele Brunetti,et al.  HISTALP—historical instrumental climatological surface time series of the Greater Alpine Region , 2007 .

[53]  G. Wahba Spline models for observational data , 1990 .

[54]  William E. Eichinger,et al.  Analysis of Sunspot Variability Using the Hilbert – Huang Transform , 2011 .

[55]  M. Koch,et al.  Wavelet and scaling analysis of monthly precipitation extremes in Germany in the 20th century: Interannual to interdecadal oscillations and the North Atlantic Oscillation influence , 2005 .

[56]  Gus L. W. Hart,et al.  Cluster expansion made easy with Bayesian compressive sensing , 2013 .

[57]  Jery R. Stedinger,et al.  A generalized maintenance of variance extension procedure for extending correlated series , 1989 .

[58]  Mattheos K. Protopapas,et al.  An analysis of global warming in the Alpine region based on nonlinear nonstationary time series models , 2012, Statistical Methods & Applications.

[59]  Aggelos K. Katsaggelos,et al.  Bayesian Compressive Sensing Using Laplace Priors , 2010, IEEE Transactions on Image Processing.

[60]  Tae-Woong Kim,et al.  Spatial rainfall model using a pattern classifier for estimating missing daily rainfall data , 2009 .

[61]  A. Elshorbagy,et al.  Performance Evaluation of Artificial Neural Networks for Runoff Prediction , 2000 .

[62]  D. Ponyavin,et al.  Synchronization in Sunspot Indices in the Two Hemispheres , 2007 .