Spatio-temporal filling of missing points in geophysical data sets

Abstract. The majority of data sets in the geosciences are obtained from observations and measurements of natural systems, rather than in the laboratory. These data sets are often full of gaps, due to to the conditions under which the measurements are made. Missing data give rise to various problems, for example in spectral estimation or in specifying boundary conditions for numerical models. Here we use Singular Spectrum Analysis (SSA) to fill the gaps in several types of data sets. For a univariate record, our procedure uses only temporal correlations in the data to fill in the missing points. For a multivariate record, multi-channel SSA (M-SSA) takes advantage of both spatial and temporal correlations. We iteratively produce estimates of missing data points, which are then used to compute a self-consistent lag-covariance matrix; cross-validation allows us to optimize the window width and number of dominant SSA or M-SSA modes to fill the gaps. The optimal parameters of our procedure depend on the distribution in time (and space) of the missing data, as well as on the variance distribution between oscillatory modes and noise. The algorithm is demonstrated on synthetic examples, as well as on data sets from oceanography, hydrology, atmospheric sciences, and space physics: global sea-surface temperature, flood-water records of the Nile River, the Southern Oscillation Index (SOI), and satellite observations of relativistic electrons.

[1]  Michael Ghil,et al.  Weather Regimes and Preferred Transition Paths in a Three-Level Quasigeostrophic Model. , 2003 .

[2]  Thomas M. Smith,et al.  Improved Global Sea Surface Temperature Analyses Using Optimum Interpolation , 1994 .

[3]  H. H. Prince Omar Toussoun,et al.  Memoire sur l'Histoire du Nil , 1926 .

[4]  David H. Schoellhamer,et al.  Singular spectrum analysis for time series with missing data , 2001 .

[5]  G. P. King,et al.  Extracting qualitative dynamics from experimental data , 1986 .

[6]  Michael Ghil,et al.  Empirical mode reduction in a model of extratropical low-frequency variability , 2006 .

[7]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[8]  Michael Ghil,et al.  Reply to T. Schneider's comment on "Spatio-temporal filling of missing points in geophysical data sets" , 2007 .

[9]  Michael Ghil,et al.  A Hierarchy of Data-Based ENSO Models , 2005 .

[10]  Ralph J. Slutz,et al.  A Comprehensive Ocean-Atmosphere Data Set , 1987 .

[11]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[12]  P. Jones,et al.  An Extension of the TahitiDarwin Southern Oscillation Index , 1987 .

[13]  R. Vautard,et al.  Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series , 1989 .

[14]  Steven C. Sherwood Climate signals from station arrays with missing data, and an application to winds , 2000 .

[15]  Lawrence Sirovich,et al.  Karhunen–Loève procedure for gappy data , 1995 .

[16]  J. Beckers,et al.  Reconstruction of incomplete oceanographic data sets using empirical orthogonal functions: application to the Adriatic Sea surface temperature , 2005 .

[17]  B L. SEARCHING FOR SIGNAL IN NOISE BY RANDOM-LAG SINGULAR SPECTRUM ANALYSIS , 1999 .

[18]  Jm Colebrook,et al.  Continuous plankton records - zooplankton and environment, northeast atlantic and north-sea, 1948-1975 , 1978 .

[19]  M. Ghil,et al.  Oscillatory modes of extended Nile River records (A.D. 622–1922) , 2005 .

[20]  Roger K. Ulrich,et al.  SEARCHING FOR SIGNAL IN NOISE BY RANDOM LAG SINGULAR SPECTRUM ANALYSIS , 1999 .

[21]  Klaus Fraedrich,et al.  Estimating the Dimensions of Weather and Climate Attractors , 1986 .

[22]  Padhraic Smyth,et al.  Multiple Regimes in Northern Hemisphere Height Fields via MixtureModel Clustering* , 1999, Journal of the Atmospheric Sciences.

[23]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[24]  Michael Ghil,et al.  ADVANCED SPECTRAL METHODS FOR CLIMATIC TIME SERIES , 2002 .

[25]  Franklin B. Schwing,et al.  Spatial structure of subsurface temperature variability in the California Current, 1950–1993 , 2003 .

[26]  M. Hughes,et al.  Global-scale temperature patterns and climate forcing over the past six centuries , 1998 .

[27]  Gene H. Golub,et al.  Regularization by Truncated Total Least Squares , 1997, SIAM J. Sci. Comput..

[28]  M. Ghil,et al.  Interdecadal oscillations and the warming trend in global temperature time series , 1991, Nature.

[29]  Michael Ghil,et al.  Software expedites singular‐spectrum analysis of noisy time series , 1995, Eos, Transactions American Geophysical Union.

[30]  Grant Foster,et al.  Wavelets for period analysis of unevenly sampled time series , 1996 .

[31]  Gordon J. MacDonald,et al.  Spectral analysis of time series generated by nonlinear processes , 1989 .

[32]  M. Benno Blumenthal,et al.  Reduced space optimal analysis for historical data sets: 136 years of Atlantic sea surface temperatures , 1997 .

[33]  J. Beckers,et al.  EOF Calculations and Data Filling from Incomplete Oceanographic Datasets , 2003 .

[34]  M. Allen,et al.  Distinguishing modulated oscillations from coloured noise in multivariate datasets , 1996 .

[35]  Thomas M. Smith,et al.  Reconstruction of Historical Sea Surface Temperatures Using Empirical Orthogonal Functions , 1996 .

[36]  Michael Schulz,et al.  REDFIT: estimating red-noise spectra directly from unevenly spaced paleoclimatic time series , 2002 .

[37]  Craig J. Johns,et al.  Infilling Sparse Records of Spatial Fields , 2003 .

[38]  Michael Ghil,et al.  Recent forecast skill for the El Niño/Southern Oscillation , 1998 .