Tile Low-Rank Approximation of Large-Scale Maximum Likelihood Estimation on Manycore Architectures

Maximum likelihood estimation is an important statistical technique for estimating missing data, for example in climate and environmental applications, which are usually large and feature data points that are irregularly spaced. In particular, the Gaussian log-likelihood function is the \emph{de facto} model, which operates on the resulting sizable dense covariance matrix. The advent of high performance systems with advanced computing power and memory capacity have enabled full simulations only for rather small dimensional climate problems, solved at the machine precision accuracy. The challenge for high dimensional problems lies in the computation requirements of the log-likelihood function, which necessitates ${\mathcal O}(n^2)$ storage and ${\mathcal O}(n^3)$ operations, where $n$ represents the number of given spatial locations. This prohibitive computational cost may be reduced by using approximation techniques that not only enable large-scale simulations otherwise intractable but also maintain the accuracy and the fidelity of the spatial statistics model. In this paper, we extend the Exascale GeoStatistics software framework (i.e., ExaGeoStat) to support the Tile Low-Rank (TLR) approximation technique, which exploits the data sparsity of the dense covariance matrix by compressing the off-diagonal tiles up to a user-defined accuracy threshold. The underlying linear algebra operations may then be carried out on this data compression format, which may ultimately reduce the arithmetic complexity of the maximum likelihood estimation and the corresponding memory footprint. Performance results of TLR-based computations on shared and distributed-memory systems attain up to 13X and 5X speedups, respectively, compared to full accuracy simulations using synthetic and real datasets (up to 2M), while ensuring adequate prediction accuracy.

[1]  Ivan V. Oseledets,et al.  "Compress and eliminate" solver for symmetric positive definite sparse matrices , 2016, SIAM J. Sci. Comput..

[2]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[3]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[4]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[5]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .

[6]  N. Cressie,et al.  A dimension-reduced approach to space-time Kalman filtering , 1999 .

[7]  Ying Sun,et al.  Geostatistics for Large Datasets , 2012 .

[8]  David E. Keyes,et al.  Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures , 2017, ISC.

[9]  Chong Gu Smoothing Spline Anova Models , 2002 .

[10]  Marc G. Genton,et al.  Correlation Models for Temperature Fields , 2011 .

[11]  Samuel Williams,et al.  An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling , 2015, SIAM J. Sci. Comput..

[12]  Jordan G. Powers,et al.  A Description of the Advanced Research WRF Version 2 , 2005 .

[13]  David E. Keyes,et al.  Hierarchical Decompositions for the Computation of High-Dimensional Multivariate Normal Probabilities , 2018 .

[14]  Ronald P. Barry,et al.  Flexible Spatial Models for Kriging and Cokriging Using Moving Averages and the Fast Fourier Transform (FFT) , 2004 .

[15]  Ying Sun,et al.  Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets , 2016 .

[16]  David E. Keyes,et al.  ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[17]  Hermann G. Matthies,et al.  Application of hierarchical matrices for computing the Karhunen–Loève expansion , 2009, Computing.

[18]  Eric Darve,et al.  A fast, memory efficient and robust sparse preconditioner based on a multifrontal approach with applications to finite‐element matrices , 2016 .

[19]  M. Stein,et al.  A Bayesian analysis of kriging , 1993 .

[20]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[21]  Alfredo Buttari,et al.  On the Complexity of the Block Low-Rank Multifrontal Factorization , 2017, SIAM J. Sci. Comput..

[22]  W. Hackbusch A Sparse Matrix Arithmetic Based on $\Cal H$-Matrices. Part I: Introduction to ${\Cal H}$-Matrices , 1999, Computing.

[23]  Steffen Börm,et al.  Approximating Gaussian Processes with H2-Matrices , 2007, ECML.

[24]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[25]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[26]  P. Guttorp,et al.  Studies in the history of probability and statistics XLIX On the Matérn correlation family , 2006 .

[27]  Eric Darve,et al.  Sparse supernodal solver using block low-rank compression: Design, performance and analysis , 2018, J. Comput. Sci..

[28]  M. Anitescu,et al.  STOCHASTIC APPROXIMATION OF SCORE FUNCTIONS FOR GAUSSIAN PROCESSES , 2013, 1312.2687.

[29]  David E. Keyes,et al.  Exploiting Data Sparsity for Large-Scale Matrix Computations , 2018, Euro-Par.

[30]  Huang Huang,et al.  Hierarchical Low Rank Approximation of Likelihoods for Large Spatial Datasets , 2016, 1605.08898.

[31]  M. Genton Separable approximations of space‐time covariance matrices , 2007 .

[32]  Eric Darve,et al.  Fast hierarchical solvers for sparse matrices using low-rank approximation , 2015 .