Dataset Reduction Techniques to Speed Up SVD Analyses on Big Geo-Datasets

The Singular Value Decomposition (SVD) is a mathematical procedure with multiple applications in the geosciences. For instance, it is used in dimensionality reduction and as a support operator for various analytical tasks applicable to spatio-temporal data. Performing SVD analyses on large datasets, however, can be computationally costly, time consuming, and sometimes practically infeasible. However, techniques exist to arrive at the same output, or at a close approximation, which requires far less effort. This article examines several such techniques in relation to the inherent scale of the structure within the data. When the values of a dataset vary slowly, e.g., in a spatial field of temperature over a country, there is autocorrelation and the field contains large scale structure. Datasets do not need a high resolution to describe such fields and their analysis can benefit from alternative SVD techniques based on rank deficiency, coarsening, or matrix factorization approaches. We use both simulated Gaussian Random Fields with various levels of autocorrelation and real-world geospatial datasets to illustrate our study while examining the accuracy of various SVD techniques. As the main result, this article provides researchers with a decision tree indicating which technique to use when and predicting the resulting level of accuracy based on the dataset’s structure scale.

[1]  M. D. Schwartz,et al.  Spring onset variations and trends in the continental United States: past and regional assessment using temperature‐based indices , 2013 .

[2]  J. Thepaut,et al.  The ERA‐Interim reanalysis: configuration and performance of the data assimilation system , 2011 .

[3]  Gholamreza Anbarjafari,et al.  Satellite Image Contrast Enhancement Using Discrete Wavelet Transform and Singular Value Decomposition , 2010, IEEE Geoscience and Remote Sensing Letters.

[4]  T. Barnett,et al.  Origins and Levels of Monthly and Seasonal Forecast Skill for United States Surface Air Temperatures Determined by Canonical Correlation Analysis , 1987 .

[5]  Emma Izquierdo-Verdiguier,et al.  Advanced Feature Extraction for Earth Observation Data Processing , 2018 .

[6]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.

[7]  J. Schjoerring,et al.  Reflectance measurement of canopy biomass and nitrogen status in wheat crops using normalized difference vegetation indices and partial least squares regression , 2003 .

[8]  Per-Gunnar Martinsson,et al.  Randomized methods for matrix computations , 2016, IAS/Park City Mathematics Series.

[9]  Reginald G. Golledge,et al.  Generalized Procedures for Evaluating Spatial Autocorrelation , 2010 .

[10]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[11]  Catherine A. Smith,et al.  An Intercomparison of Methods for Finding Coupled Patterns in Climate Data , 1992 .

[12]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[13]  E. Izquierdo-Verdiguier,et al.  Analyzing the cross-correlation between the extended spring indices and the AVHRR start of season phenometric , 2018 .

[14]  Emma Izquierdo-Verdiguier,et al.  Multiset Kernel CCA for multitemporal image classification , 2013, MultiTemp 2013: 7th International Workshop on the Analysis of Multi-temporal Remote Sensing Images.

[15]  Tony F. Chan,et al.  An Improved Algorithm for Computing the Singular Value Decomposition , 1982, TOMS.

[16]  H. Storch,et al.  Statistical Analysis in Climate Research , 2000 .

[17]  A. Lacis,et al.  Application of spectral analysis techniques to the intercomparison of aerosol data – Part 4: Synthesized analysis of multisensor satellite and ground-based AOD measurements using combined maximum covariance analysis , 2014 .

[18]  Amir Hossein Zaji,et al.  Adaptive neuro-fuzzy inference system multi-objective optimization using the genetic algorithm/singular value decomposition method for modelling the discharge coefficient in rectangular sharp-crested side weirs , 2016 .

[19]  D. Krige A statistical approach to some basic mine valuation problems on the Witwatersrand, by D.G. Krige, published in the Journal, December 1951 : introduction by the author , 1951 .

[20]  Gene H. Golub,et al.  Numerical methods for computing angles between linear subspaces , 1971, Milestones in Matrix Computation.

[21]  Lorenzo Bruzzone,et al.  Semisupervised Kernel Feature Extraction for Remote Sensing Image Analysis , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Ian T. Jolliffe,et al.  Empirical orthogonal functions and related techniques in atmospheric science: A review , 2007 .

[23]  U. Indahl,et al.  Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition , 2017, Genetics Selection Evolution.

[24]  A. Lacis,et al.  Application of spectral analysis techniques in the intercomparison of aerosol data. Part II: Using maximum covariance analysis to effectively compare spatiotemporal variability of satellite and AERONET measured aerosol optical depth , 2014 .

[25]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[26]  Anand Rangarajan,et al.  Image Denoising Using the Higher Order Singular Value Decomposition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  E. Izquierdo-Verdiguier,et al.  Using cloud computing to study trends and patterns in the Extended Spring Indices : abstract , 2015 .

[28]  W. Hazeleger,et al.  Synthesis and evaluation of historical meridional heat transport from midlatitudes towards the Arctic , 2019 .

[29]  C. Kobayashi,et al.  The JRA-55 Reanalysis: General Specifications and Basic Characteristics , 2015 .

[30]  Allan Aasbjerg Nielsen,et al.  The Regularized Iteratively Reweighted MAD Method for Change Detection in Multi- and Hyperspectral Data , 2007, IEEE Transactions on Image Processing.

[31]  Mark Tygert,et al.  Randomized algorithms for distributed computation of principal component analysis and singular value decomposition , 2016, Adv. Comput. Math..