High Performance Multivariate Geospatial Statistics on Manycore Systems

Modeling and inferring spatial relationships and predicting missing values of environmental data are some of the main tasks of geospatial statisticians. These routine tasks are accomplished using multivariate geospatial models and the cokriging technique. The latter requires the evaluation of the expensive Gaussian log-likelihood function, which has impeded the adoption of multivariate geospatial models for large multivariate spatial datasets. However, this large-scale cokriging challenge provides a fertile ground for supercomputing implementations for the geospatial statistics community as it is paramount to scale computational capability to match the growth in environmental data coming from the widespread use of different data collection technologies. In this article, we develop and deploy large-scale multivariate spatial modeling and inference on parallel hardware architectures. To tackle the increasing complexity in matrix operations and the massive concurrency in parallel systems, we leverage low-rank matrix approximation techniques with task-based programming models and schedule the asynchronous computational tasks using a dynamic runtime system. The proposed framework provides both the dense and the approximated computations of the Gaussian log-likelihood function. It demonstrates accuracy robustness and performance scalability on a variety of computer systems. Using both synthetic and real datasets, the low-rank matrix approximation shows better performance compared to exact computation, while preserving the application requirements in both parameter estimation and prediction accuracy. We also propose a novel algorithm to assess the prediction accuracy after the online parameter estimation. The algorithm quantifies prediction performance and provides a benchmark for measuring the efficiency and accuracy of several approximation techniques in multivariate spatial modeling.

[1]  Prabhat,et al.  Parallel Kriging Analysis for Large Spatial Datasets , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[2]  David E. Keyes,et al.  ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[3]  Nicolas Doucet,et al.  Mixed-Precision Tomographic Reconstructor Computations on Hardware Accelerators , 2019, 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3).

[4]  Na Li,et al.  Simple Parallel Statistical Computing in R , 2007 .

[5]  Richard Vuduc,et al.  Modern Accelerator Technologies for Geographic Information Science , 2013, Springer US.

[6]  Jean-Laurent Duchaud,et al.  Solar irradiation prediction with machine learning: Forecasting models selection method depending on weather variability , 2018, Energy.

[7]  Michele Bottazzi,et al.  The design, deployment, and testing of kriging models in GEOframe with SIK-0.9.8 , 2018, Geoscientific Model Development.

[8]  Fabrice Dupros,et al.  An Out-of-core GPU Approach for Accelerating Geostatistical Interpolation , 2014, ICCS.

[9]  Chao Zeng,et al.  Missing Data Reconstruction in Remote Sensing Image With a Unified Spatial–Temporal–Spectral Deep Convolutional Neural Network , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Dionissios T. Hristopulos,et al.  Random Fields for Spatial Data Modeling: A Primer for Scientists and Engineers , 2020 .

[11]  Jianting Zhang,et al.  Parallel Primitives-Based Spatial Join of Geospatial Data on GPGPUs , 2013 .

[12]  April Morton,et al.  High performance Data Driven Agent-based Modeling Framework for Simulation of Commute Mode Choices in Metropolitan Area , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[13]  Susan Ostrouchov,et al.  LAPACK Working Note 41: Installation Guide for LAPACK , 1992 .

[14]  K. Mardia,et al.  Maximum likelihood estimation of models for residual covariance in spatial regression , 1984 .

[15]  Dragan Stojanovic,et al.  High-performance computing in GIS: techniques and applications , 2013, Int. J. Reason. based Intell. Syst..

[16]  A-Xing Zhu,et al.  Next generation of GIS: must be easy , 2020, Ann. GIS.

[17]  P. Burrough GIS and geostatistics: Essential partners for spatial analysis , 2001, Environmental and Ecological Statistics.

[18]  Huang Huang,et al.  Hierarchical Low Rank Approximation of Likelihoods for Large Spatial Datasets , 2016, 1605.08898.

[19]  Bo Li,et al.  An approach to modeling asymmetric multivariate spatial covariance structures , 2011, J. Multivar. Anal..

[20]  F. Binkowski,et al.  Models-3 community multiscale air quality (cmaq) model aerosol component , 2003 .

[21]  Sudipto Banerjee,et al.  High-Dimensional Bayesian Geostatistics. , 2017, Bayesian analysis.

[22]  Xin Pan,et al.  Joint Deep Learning for land cover and land use classification , 2019, Remote Sensing of Environment.

[23]  David E. Keyes,et al.  Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures , 2017, ISC.

[24]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[25]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[26]  Eric Darve,et al.  A fast block low-rank dense solver with applications to finite-element matrices , 2014, J. Comput. Phys..

[28]  G. Turkiyyah,et al.  Hierarchical algorithms on hierarchical architectures , 2020, Philosophical Transactions of the Royal Society A.

[29]  Jae Young Choi,et al.  Restoration of Missing Patterns on Satellite Infrared Sea Surface Temperature Images Due to Cloud Coverage Using Deep Generative Inpainting Network , 2021, Journal of Marine Science and Engineering.

[30]  W. Hackbusch,et al.  Hierarchical Matrices: Algorithms and Analysis , 2015 .

[31]  Siwei Yu,et al.  Data-driven geophysics: from dictionary learning to deep learning , 2020, ArXiv.

[32]  Tangpei Cheng,et al.  Accelerating universal Kriging interpolation algorithm using CUDA-enabled GPU , 2013, Comput. Geosci..

[33]  Pejman Tahmasebi,et al.  Accelerating geostatistical simulations using graphics processing units (GPU) , 2012, Comput. Geosci..

[34]  Vasily Demyanov,et al.  A Special Issue on Data Science for Geosciences , 2019, Mathematical Geosciences.

[35]  David E. Keyes,et al.  ExaGeoStatR: A Package for Large-Scale Geostatistics in R , 2019, ArXiv.

[36]  Jing Ren,et al.  Moving from exascale to zettascale computing: challenges and techniques , 2018, Frontiers of Information Technology & Electronic Engineering.

[37]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[38]  David E. Keyes,et al.  Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[39]  Thomas Hérault,et al.  PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.

[40]  Maarten V. de Hoop,et al.  Machine learning for data-driven discovery in solid Earth geoscience , 2019, Science.

[41]  Christopher Kadow,et al.  Artificial intelligence reconstructs missing climate information , 2020, Nature Geoscience.

[42]  Marc G. Genton,et al.  Cross-Covariance Functions for Multivariate Geostatistics , 2015, 1507.08017.

[43]  David E. Keyes,et al.  Exploiting Data Sparsity for Large-Scale Matrix Computations , 2018, Euro-Par.

[44]  T. Gneiting,et al.  Matérn Cross-Covariance Functions for Multivariate Random Fields , 2010 .

[46]  Exeter,et al.  Towards neural Earth system modelling by integrating artificial intelligence in Earth system science , 2021, Nature Machine Intelligence.

[47]  Ana Cortés,et al.  Parallel ordinary kriging interpolation incorporating automatic variogram fitting , 2011, Comput. Geosci..

[48]  Marc G. Genton,et al.  Gaussian likelihood inference on data from trans‐Gaussian random fields with Matérn covariance function , 2018 .

[49]  P. Atkinson,et al.  A Special Issue on the Importance of Geostatistics in the Era of Data Science , 2020, Mathematical Geosciences.

[50]  Ian T. Foster,et al.  Introduction to the Special Issue on Parallel Computing in Climate and Weather Modeling , 1995, Parallel Comput..

[51]  Keith C. Clarke,et al.  Big Spatiotemporal Data Analytics: a research and innovation frontier , 2019, Int. J. Geogr. Inf. Sci..

[52]  Mikhail Kanevski,et al.  A novel framework for spatio-temporal prediction of environmental data using deep learning , 2020, Scientific Reports.

[53]  Jiping Liu,et al.  Using High-Performance Computing to Address the Challenge of Land Use/Land Cover Change Analysis on Spatial Big Data , 2018, ISPRS Int. J. Geo Inf..

[54]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[55]  Hsueh-Ting Chu High-Performance Computing for Measurement of Cancer Gene Signatures , 2019 .

[56]  Théo Mary,et al.  Block Low-Rank multifrontal solvers: complexity, performance, and scalability. (Solveurs multifrontaux exploitant des blocs de rang faible: complexité, performance et parallélisme) , 2017 .

[57]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[58]  Ying Sun,et al.  A Valid Matérn Class of Cross-Covariance Functions for Multivariate Random Fields With Any Number of Components , 2012 .

[59]  Chak Man Andrew Yip Statistical characteristics and mapping of near-surface and elevated wind resources in the Middle East , 2018 .

[60]  Hao Zhang,et al.  When Doesn't Cokriging Outperform Kriging? , 2015, 1507.08403.

[61]  T. Severini Likelihood Methods in Statistics , 2001 .

[62]  N. Cressie,et al.  Multivariate Spatial Covariance Models: A Conditional Approach , 2015, 1504.01865.

[63]  Ying Sun,et al.  Efficiency assessment of approximated spatial predictions for large datasets , 2019, 1911.04109.

[64]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[65]  M. Roorda,et al.  Quantifying the air quality and health benefits of greening freight movements. , 2020, Environmental research.

[66]  Douglas W. Nychka,et al.  Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets , 2008 .