GP-HMAT: Scalable, ${O}(n\log(n))$ Gaussian Process Regression with Hierarchical Low-Rank Matrices

A Gaussian process (GP) is a powerful and widely used regression technique. The main building block of a GP regression is the covariance kernel, which characterizes the relationship between pairs in the random field. The optimization to find the optimal kernel, however, requires several large-scale and often unstructured matrix inversions. We tackle this challenge by introducing a hierarchical matrix approach, named HMAT, which effectively decomposes the matrix structure, in a recursive manner, into significantly smaller matrices where a direct approach could be used for inversion. Our matrix partitioning uses a particular aggregation strategy for data points, which promotes the low-rank structure of off-diagonal blocks in the hierarchical kernel matrix. We employ a randomized linear algebra method for matrix reduction on the low-rank off-diagonal blocks without factorizing a large matrix. We provide analytical error and cost estimates for the inversion of the matrix, investigate them empirically with numerical computations, and demonstrate the application of our approach on three numerical examples involving GP regression for engineering problems and a large-scale real dataset. We provide the computer implementation of GP-HMAT, HMAT adapted for GP likelihood and derivative computations, and the implementation of the last numerical example on a real dataset. We demonstrate superior scalability of the HMAT approach compared to built-in \ operator in MATLAB for large-scale linear solves Ax = y via a repeatable and verifiable empirical study. An extension to hierarchical semiseparable (HSS) matrices is discussed as future research.

[1]  Irad Yavneh,et al.  Non-Galerkin Multigrid Based on Sparsified Smoothed Aggregation , 2015, SIAM J. Sci. Comput..

[2]  Per-Gunnar Martinsson,et al.  A Fast Randomized Algorithm for Computing a Hierarchically Semiseparable Representation of a Matrix , 2011, SIAM J. Matrix Anal. Appl..

[3]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[4]  Mark Rudelson,et al.  Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[5]  Eric Darve,et al.  An O(NlogN)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal O (N \log N)$$\end{document} Fast Direct Solver fo , 2013, Journal of Scientific Computing.

[6]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[7]  Wei Chu,et al.  A matching pursuit approach to sparse Gaussian process regression , 2005, NIPS.

[8]  Jianlin Xia,et al.  Superfast Multifrontal Method for Large Structured Linear Systems of Equations , 2009, SIAM J. Matrix Anal. Appl..

[9]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[10]  Robert Michael Kirby,et al.  Multilevel Designed Quadrature for Partial Differential Equations with Random Inputs , 2021, SIAM J. Sci. Comput..

[11]  Mihai Anitescu,et al.  Scalable Gaussian Process Computations Using Hierarchical Matrices , 2018, Journal of Computational and Graphical Statistics.

[12]  Alle-Jan van der Veen,et al.  Some Fast Algorithms for Sequentially Semiseparable Representations , 2005, SIAM J. Matrix Anal. Appl..

[13]  HackbuschW. A sparse matrix arithmetic based on H-matrices. Part I , 1999 .

[14]  V. Rokhlin,et al.  A fast randomized algorithm for overdetermined linear least-squares regression , 2008, Proceedings of the National Academy of Sciences.

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Boyan Stefanov Lazarov,et al.  On multigrid-CG for efficient topology optimization , 2014 .

[17]  A. George Nested Dissection of a Regular Finite Element Mesh , 1973 .

[18]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[19]  Jianlin Xia,et al.  Robust Approximate Cholesky Factorization of Rank-Structured Symmetric Positive Definite Matrices , 2010, SIAM J. Matrix Anal. Appl..

[20]  Wolfgang Hackbusch,et al.  A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , 1999, Computing.

[21]  V. Rokhlin,et al.  A fast randomized algorithm for the approximation of matrices ✩ , 2007 .

[22]  Robert Michael Kirby,et al.  Numerical Integration in Multiple Dimensions with Designed Quadrature , 2018, SIAM J. Sci. Comput..

[23]  Wolfgang Hackbusch,et al.  Construction and Arithmetics of H-Matrices , 2003, Computing.

[24]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[25]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[26]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[27]  Per-Gunnar Martinsson,et al.  A Fast Direct Solver for a Class of Elliptic Partial Differential Equations , 2009, J. Sci. Comput..

[28]  Marian Brezina,et al.  Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems , 2005, Computing.

[29]  Per-Gunnar Martinsson,et al.  On the Compression of Low Rank Matrices , 2005, SIAM J. Sci. Comput..

[30]  Carl E. Rasmussen,et al.  Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[31]  Jie Chen,et al.  Linear-Cost Covariance Functions for Gaussian Random Fields , 2017, Journal of the American Statistical Association.

[32]  Alex Townsend,et al.  A generalization of the randomized singular value decomposition , 2021, ICLR.

[33]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[34]  John Wang Montclair Second Edition , 2004 .

[35]  Jianlin Xia,et al.  Fast algorithms for hierarchically semiseparable matrices , 2010, Numer. Linear Algebra Appl..

[36]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[37]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[38]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[39]  Luke N. Olson,et al.  Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods , 2012, SIAM J. Sci. Comput..

[40]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[41]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[42]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[43]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[44]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[45]  Lexing Ying,et al.  Fast construction of hierarchical matrix representation from matrix-vector multiplication , 2009, J. Comput. Phys..

[46]  Max Gunzburger,et al.  A Multilevel Stochastic Collocation Method for Partial Differential Equations with Random Input Data , 2014, SIAM/ASA J. Uncertain. Quantification.

[47]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[48]  Houman Owhadi,et al.  Sparse Cholesky factorization by Kullback-Leibler minimization , 2020, SIAM J. Sci. Comput..