An evolutionary spectrum approach to incorporate large‐scale geographical descriptors on global processes

We introduce a nonstationary spatio-temporal statistical model for gridded data on the sphere. The model specifies a computationally convenient covariance structure that depends on heterogeneous geography. Widely used statistical models on a spherical domain are nonstationary for different latitudes, but stationary at the same latitude (axial symmetry). This assumption has been acknowledged to be too restrictive for quantities such as surface temperature, whose statistical behavior is influenced by large scale geographical descriptors such as land and ocean. We propose an evolutionary spectrum approach that is able to account for different regimes across the Earth's geography, and results in a more general and flexible class of models that vastly outperforms axially symmetric models and captures longitudinal patterns that would otherwise be assumed constant. The model can be estimated with in a multi-step conditional likelihood approximation that preserves the nonstationary features while allowing for easily distributed computations: we show how the fit of a data sets larger than 20 million data can be performed in less than one day on a state-of-the-art workstation. Once the parameters are estimated, it is possible to instantaneously generate surrogate runs from a common laptop. Further, the resulting estimates from the statistical model can be regarded as a synthetic description (i.e. a compression) of the space-time characteristics of an entire initial condition ensemble. Compared to traditional algorithms aiming at compressing the bit-by-bit information on each climate model run, the proposed approach achieves vastly superior compression rates.

[1]  Hernando Ombao,et al.  A Multi-Resolution Spatio-Temporal Model for Brain Activation and Connectivity in fMRI Data , 2016, 1602.02435.

[2]  Mikyoung Jun,et al.  Non‐stationary Cross‐Covariance Models for Multivariate Processes on a Globe , 2011 .

[3]  A. Thomson,et al.  The representative concentration pathways: an overview , 2011 .

[4]  P. B. Holden,et al.  Dimensionally reduced emulation of an AOGCM for application to integrated assessment modelling , 2010 .

[5]  Richard H. Jones,et al.  Stochastic Processes on a Sphere , 1963 .

[6]  Karl E. Taylor,et al.  An overview of CMIP5 and the experiment design , 2012 .

[7]  Robert Latham,et al.  Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data , 2011, Euro-Par.

[8]  Dorin Drignei,et al.  PARAMETER ESTIMATION FOR COMPUTATIONALLY INTENSIVE NONLINEAR REGRESSION WITH AN APPLICATION TO CLIMATE MODELING , 2008, 0901.3665.

[9]  Mariana Vertenstein,et al.  A methodology for evaluating the impact of data compression on climate simulation data , 2014, HPDC '14.

[10]  Peter Lindstrom,et al.  Assessing the effects of data compression in simulations using physically motivated metrics , 2013, SC.

[11]  James P. Ahrens,et al.  Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[12]  Maryse Labriet,et al.  PLASIM-ENTSem v1.0: a spatio-temporal emulator of future climate change for impacts assessment , 2013 .

[13]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[14]  Mikyoung Jun,et al.  Matérn-based nonstationary cross-covariance models for global processes , 2014, J. Multivar. Anal..

[15]  Matthew D. Collins,et al.  Climate predictability on interannual to decadal time scales: the initial value problem , 2002 .

[16]  Murali Haran,et al.  A composite likelihood approach to computer model calibration using high-dimensional spatial data , 2013, 1308.0049.

[17]  M. Priestley Evolutionary Spectra and Non‐Stationary Processes , 1965 .

[18]  Martin Burtscher,et al.  High Throughput Compression of Double-Precision Floating-Point Data , 2007, 2007 Data Compression Conference (DCC'07).

[19]  Grant Branstator,et al.  Two Limits of Initial-Value Decadal Predictability in a CGCM , 2010 .

[20]  Martin Isenburg,et al.  Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[21]  E. Lorenz Deterministic nonperiodic flow , 1963 .

[22]  Joseph Guinness,et al.  Transformation to approximate independence for locally stationary Gaussian processes , 2013 .

[23]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[24]  Jian Yin,et al.  Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[25]  Michael L. Stein,et al.  Erratum: Using covariates to model dependence in nonstationary, high frequency meteorological processes , 2014 .

[26]  Robert Jacob,et al.  Statistical emulation of climate model projections based on precomputed GCM runs , 2013 .

[27]  D. Nychka,et al.  Spatial Analysis to Quantify Numerical Model Bias and Dependence , 2008 .

[28]  Michael L. Stein,et al.  Using covariates to model dependence in nonstationary, high‐frequency meteorological processes , 2014 .

[29]  Ying Sun,et al.  Visuanimation in statistics + , 2015 .

[30]  Mikyoung Jun,et al.  An Approach to Producing Space–Time Covariance Functions on Spheres , 2007, Technometrics.

[31]  Murali Haran,et al.  Inferring likelihoods and climate system characteristics from climate models and multiple tracers , 2012 .

[32]  A simplified representation of the covariance structure of axially symmetric processes on the sphere , 2012 .

[33]  G. Danabasoglu,et al.  The Community Climate System Model Version 4 , 2011 .

[34]  Thomas Ludwig,et al.  Evaluating Lossy Compression on Climate Data , 2013, ISC.

[35]  T. Gneiting Strictly and non-strictly positive definite functions on spheres , 2011, 1111.7077.

[36]  Matthew D. Collins,et al.  Assessing the Relative Roles of Initial and Boundary Conditions in Interannual to Decadal Climate Predictability , 2002 .

[37]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[38]  Franck Cappello,et al.  Improving floating point compression through binary masks , 2013, 2013 IEEE International Conference on Big Data.

[39]  Michael L. Stein,et al.  Global space–time models for climate ensembles , 2013, 1311.7319.

[40]  Marc G. Genton,et al.  Beyond axial symmetry: An improved class of models for global data , 2014 .

[41]  F. Lindgren,et al.  Spatial models generated by nested stochastic partial differential equations, with an application to global ozone mapping , 2011, 1104.3436.

[42]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[43]  Robert Latham,et al.  ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[44]  B. Sansó,et al.  Inferring climate system properties using a computer model , 2008 .

[45]  Mikyoung Jun,et al.  Nonstationary covariance models for global data , 2008, 0901.3980.

[46]  Michael L. Stein,et al.  Some theory for anisotropic processes on the sphere , 2012 .

[47]  Marc G. Genton,et al.  Compressing an Ensemble With Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature , 2016, Technometrics.

[48]  Chris E. Forest,et al.  Statistical calibration of climate system properties , 2009 .