Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity

Dense kernel matrices Θ∈R^(N×N) obtained from point evaluations of a covariance function G at locations {x_i}1≤i≤N arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions elliptic boundary value problems and approximately equally spaced sampling points, we show how to identify a subset S⊂{1,…,N}×{1,…,N}, with #S=O(Nlog(N)log^d(N/ϵ)), such that the zero fill-in block-incomplete Cholesky decomposition of Θ_(i,j)1_((i,j)∈S) is an ϵ-approximation of Θ. This block-factorisation can provably be obtained in O(Nlog^2(N)(log(1/ϵ)+log^2(N))^(4d+1)) complexity in time. Numerical evidence further suggests that element-wise Cholesky decomposition with the same ordering constitutes an O(Nlog^2(N)log^(2d)(N/ϵ)) solver. The algorithm only needs to know the spatial configuration of the x_i and does not require an analytic representation of G. Furthermore, an approximate PCA with optimal rate of convergence in the operator norm can be easily read off from this decomposition. Hence, by using only subsampling and the incomplete Cholesky decomposition, we obtain at nearly linear complexity the compression, inversion and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky decomposition we also obtain a near-linear-time solver for elliptic PDEs.

[1]  Michele Benzi,et al.  Decay Bounds for Functions of Hermitian Matrices with Banded or Kronecker Structure , 2015, SIAM J. Matrix Anal. Appl..

[2]  J. Meijerink,et al.  An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrix , 1977 .

[3]  Larry L. Schumaker,et al.  Finite Elements: Theory, Fast Solvers, and Applications in Elasticity Theory , 2007 .

[4]  H. Ohanian,et al.  The linear approximation , 2013 .

[5]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[6]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[7]  Tilmann Gneiting,et al.  Stochastic Models That Separate Fractal Dimension and the Hurst Effect , 2001, SIAM Rev..

[8]  J. E. H. Shaw,et al.  A Quasirandom Approach to Integration in Bayesian Statistics , 1988 .

[9]  Ulrich Trottenberg,et al.  On fast poisson solvers and applications , 1978 .

[10]  R. Schaback,et al.  A Practical Guide to Radial Basis Functions , 2022 .

[11]  Michael L. Stein,et al.  Limitations on low rank approximations for covariance matrices of spatial data , 2014 .

[12]  S. Mallat A wavelet tour of signal processing , 1998 .

[13]  Shivkumar Chandrasekaran,et al.  A Fast ULV Decomposition Solver for Hierarchically Semiseparable Representations , 2006, SIAM J. Matrix Anal. Appl..

[14]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[15]  Ralf Kornhuber,et al.  An analysis of a class of variational multiscale methods based on subspace decomposition , 2016, Math. Comput..

[16]  Daniele Venturi,et al.  Multifidelity Information Fusion Algorithms for High-Dimensional Systems and Massive Data sets , 2016, SIAM J. Sci. Comput..

[17]  Jianlin Xia,et al.  Fast algorithms for hierarchically semiseparable matrices , 2010, Numer. Linear Algebra Appl..

[18]  G. Beylkin,et al.  A Multiresolution Strategy for Reduction of Elliptic PDEs and Eigenvalue Problems , 1998 .

[19]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[20]  W. Hackbusch,et al.  Introduction to Hierarchical Matrices with Applications , 2003 .

[21]  Rae. Z.H. Aliyev,et al.  Interpolation of Spatial Data , 2018, Biomedical Journal of Scientific & Technical Research.

[22]  Alan George,et al.  The Evolution of the Minimum Degree Ordering Algorithm , 1989, SIAM Rev..

[23]  Iain Murray,et al.  A framework for evaluating approximation methods for Gaussian process regression , 2012, J. Mach. Learn. Res..

[24]  Boris N. Khoromskij,et al.  A Sparse H-Matrix Arithmetic. Part II: Application to Multi-Dimensional Problems , 2000, Computing.

[25]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[26]  Reinhold Schneider,et al.  Wavelet Galerkin Schemes for Boundary Integral Equations-Implementation and Quadrature , 2006, SIAM J. Sci. Comput..

[27]  D. Owen Handbook of Mathematical Functions with Formulas , 1965 .

[28]  Daniel Peterseim,et al.  Localization of elliptic multiscale problems , 2011, Math. Comput..

[29]  William F. Moss,et al.  Decay rates for inverses of band matrices , 1984 .

[30]  Andrew M. Stuart,et al.  Statistical analysis of differential equations: introducing probability measures on numerical solutions , 2016, Statistics and Computing.

[31]  M. Urner Scattered Data Approximation , 2016 .

[32]  Rob P. Stevenson,et al.  Computation of Singular Integral Operators in Wavelet Coordinates , 2005, Computing.

[33]  Adam J. Rothman,et al.  A new approach to Cholesky-based covariance regularization in high dimensions , 2009, 0903.0645.

[34]  R. Coifman,et al.  Fast wavelet transforms and numerical algorithms I , 1991 .

[35]  Michele Benzi,et al.  Orderings for Factorized Sparse Approximate Inverse Preconditioners , 1999, SIAM J. Sci. Comput..

[36]  B. Matérn Spatial variation : Stochastic models and their application to some problems in forest surveys and other sampling investigations , 1960 .

[37]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[38]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[39]  S. Börm Efficient Numerical Methods for Non-local Operators , 2010 .

[40]  Gregory E. Fasshauer,et al.  Green’s Functions: Taking Another Look at Kernel Approximation, RadialBasis Functions, and Splines , 2012 .

[41]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[42]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[43]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[44]  Philipp Hennig,et al.  Probabilistic Interpretation of Linear Solvers , 2014, SIAM J. Optim..

[45]  M. Benzi Localization in Matrix Computations: Theory and Applications , 2016 .

[46]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[47]  H. Owhadi,et al.  Polyharmonic homogenization, rough polyharmonic splines and sparse super-localization , 2012, 1212.0812.

[48]  D. Nychka,et al.  Covariance Tapering for Interpolation of Large Spatial Datasets , 2006 .

[49]  Lexing Ying,et al.  SelInv---An Algorithm for Selected Inversion of a Sparse Symmetric Matrix , 2011, TOMS.

[50]  Sergej Rjasanow,et al.  Adaptive Low-Rank Approximation of Collocation Matrices , 2003, Computing.

[51]  Stephen J. Roberts,et al.  Entropic Trace Estimates for Log Determinants , 2017, ECML/PKDD.

[52]  Houman Owhadi,et al.  Multigrid with Rough Coefficients and Multiresolution Operator Decomposition from Hierarchical Information Games , 2015, SIAM Rev..

[53]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[54]  E. Haynsworth,et al.  An identity for the Schur complement of a matrix , 1969 .

[55]  Georg Stadler,et al.  Mitigating the Influence of the Boundary on PDE-based Covariance Operators , 2016, 1610.05280.

[56]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[57]  Michael A. Osborne,et al.  Probabilistic Integration: A Role for Statisticians in Numerical Analysis? , 2015 .

[58]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[59]  P. Whittle ON STATIONARY PROCESSES IN THE PLANE , 1954 .

[60]  Gregory Beylkin,et al.  LU Factorization of Non-standard Forms and Direct Multiresolution Solvers , 1998 .

[61]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[62]  George C. Hsiao,et al.  Boundary element methods: an overview , 2006 .

[63]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[64]  Wolfgang Dahmen,et al.  Compression Techniques for Boundary Integral Equations - Asymptotically Optimal Complexity Estimates , 2006, SIAM J. Numer. Anal..

[65]  L. Trefethen,et al.  Numerical linear algebra , 1997 .

[66]  Steffen Börm Approximation of solution operators of elliptic partial differential equations by H- and H2-matrices , 2007 .

[67]  W. Hackbusch,et al.  A fast iterative method for solving poisson’s equation in a general region , 1978 .

[68]  Paris Perdikaris,et al.  Numerical Gaussian Processes for Time-Dependent and Nonlinear Partial Differential Equations , 2017, SIAM J. Sci. Comput..

[69]  Takeshi Iwashita,et al.  Block Red-Black Ordering: A New Ordering Strategy for Parallelization of ICCG Method , 2004, International Journal of Parallel Programming.

[70]  J. Bramble,et al.  Estimation of Linear Functionals on Sobolev Spaces with Application to Fourier Transforms and Spline Interpolation , 1970 .

[71]  Per-Gunnar Martinsson,et al.  Compressing Rank-Structured Matrices via Randomized Sampling , 2015, SIAM J. Sci. Comput..

[72]  Joseph Guinness,et al.  Permutation and Grouping Methods for Sharpening Gaussian Process Approximations , 2016, Technometrics.

[73]  Jianqing Fan,et al.  An Overview of the Estimation of Large Covariance and Precision Matrices , 2015, The Econometrics Journal.

[74]  A. Brandt Multi-Level Adaptive Techniques (MLAT) for Partial Differential Equations: Ideas and Software , 1977 .

[75]  Mario Bebendorf,et al.  Why Finite Element Discretizations Can Be Factored by Triangular Hierarchical Matrices , 2007, SIAM J. Numer. Anal..

[76]  G. Beylkin On Multiresolution Methods in Numerical Analysis , 2007 .

[77]  M. Girolami,et al.  Bayesian Solution Uncertainty Quantification for Differential Equations , 2013 .

[78]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  A. George Nested Dissection of a Regular Finite Element Mesh , 1973 .

[80]  Michael Feischl,et al.  Sparse Compression of Expected Solution Operators , 2018, SIAM J. Numer. Anal..

[81]  Charles A. Micchelli,et al.  A Survey of Optimal Recovery , 1977 .

[82]  Thore Husfeldt,et al.  Graph colouring algorithms , 2015, ArXiv.

[83]  Andrew M. Stuart,et al.  Inverse problems: A Bayesian perspective , 2010, Acta Numerica.

[84]  F. M. Larkin Gaussian measure in Hilbert space and applications in numerical analysis , 1972 .

[85]  Jianhua Z. Huang,et al.  A full scale approximation of covariance functions for large spatial data sets , 2012 .

[86]  MARIO BEBENDORF,et al.  Low-Rank Approximation of Elliptic Boundary Value Problems with High-Contrast Coefficients , 2014, SIAM J. Math. Anal..

[87]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[88]  Rob Stevenson,et al.  Adaptive wavelet methods for solving operator equations: An overview , 2009 .

[89]  Shai Dekel,et al.  The Bramble-Hilbert Lemma for Convex Domains , 2004, SIAM J. Math. Anal..

[90]  Wolfgang Hackbusch,et al.  A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , 1999, Computing.

[91]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[92]  M. Gu,et al.  Strong rank revealing LU factorizations , 2003 .

[93]  Michael A. Osborne,et al.  Probabilistic numerics and uncertainty in computations , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[94]  Thomas Y. Hou,et al.  Sparse operator compression of higher-order elliptic operators with rough coefficients , 2017, Research in the Mathematical Sciences.

[95]  Michael L. Stein,et al.  The screening effect in Kriging , 2002 .

[96]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[97]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[98]  Mark A. Girolami,et al.  Bayesian Probabilistic Numerical Methods , 2017, SIAM Rev..

[99]  Norbert Heuer,et al.  Adaptive Boundary Element Methods , 2014 .

[100]  Houman Owhadi,et al.  Bayesian Numerical Homogenization , 2014, Multiscale Model. Simul..

[101]  Thomas Strohmer,et al.  Localization of Matrix Factorizations , 2013, Found. Comput. Math..

[102]  Anoop Gupta,et al.  An efficient block-oriented approach to parallel sparse Cholesky factorization , 1993, Supercomputing '93. Proceedings.

[103]  Youssef M. Marzouk,et al.  Inference via Low-Dimensional Couplings , 2017, J. Mach. Learn. Res..

[104]  Matthias Katzfuss,et al.  A Multi-Resolution Approximation for Massive Spatial Datasets , 2015, 1507.04789.

[105]  D. Rose,et al.  Generalized nested dissection , 1977 .

[106]  Marco A. Iglesias,et al.  Hierarchical Bayesian level set inversion , 2016, Statistics and Computing.

[107]  Wolfgang Hackbusch,et al.  Multi-grid methods and applications , 1985, Springer series in computational mathematics.

[108]  C. Schwab,et al.  Boundary Element Methods , 2010 .

[109]  Mark A. Girolami,et al.  Probabilistic Meshless Methods for Partial Differential Equations and Bayesian Inverse Problems , 2016, ArXiv.

[110]  R. Tarjan,et al.  The analysis of a nested dissection algorithm , 1987 .

[111]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[112]  Michael L. Stein,et al.  2010 Rietz lecture: When does the screening effect hold? , 2011, 1203.1801.

[113]  Leslie Greengard,et al.  Fast Direct Methods for Gaussian Processes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[114]  Ronald A. DeVore,et al.  Multiscale, Nonlinear and Adaptive Approximation , 2009 .

[115]  Petteri Piiroinen,et al.  Constructing continuous stationary covariances as limits of the second-order stochastic difference equations , 2013 .

[116]  Lexing Ying,et al.  Hierarchical Interpolative Factorization for Elliptic Operators: Integral Equations , 2013, 1307.2666.

[117]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[118]  Lassi Roininen,et al.  Whittle-Matérn priors for Bayesian statistical inversion with applications in electrical impedance tomography , 2014 .

[119]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[120]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[121]  Ilse C. F. Ipsen,et al.  Randomized matrix-free trace and log-determinant estimators , 2016, Numerische Mathematik.

[122]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[123]  Jianlin Xia,et al.  New Efficient and Robust HSS Cholesky Factorization of SPD Matrices , 2012, SIAM J. Matrix Anal. Appl..

[124]  Houman Owhadi,et al.  Conditioning Gaussian measure on Hilbert space , 2015, 1506.04208.

[125]  Steffen Börm,et al.  Data-sparse Approximation by Adaptive ℋ2-Matrices , 2002, Computing.

[126]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[127]  Danny Bickson,et al.  Gaussian Belief Propagation: Theory and Aplication , 2008, 0811.2518.

[128]  Donald E. Myers,et al.  Basic Linear Geostatistics , 1998, Technometrics.

[129]  Eric Darve,et al.  The black-box fast multipole method , 2009, J. Comput. Phys..

[130]  Matthias Katzfuss,et al.  A class of multi-resolution approximations for large spatial datasets , 2017, Statistica Sinica.

[131]  David Duvenaud,et al.  Probabilistic ODE Solvers with Runge-Kutta Means , 2014, NIPS.

[132]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[133]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[134]  D. Nychka,et al.  A Multiresolution Gaussian Process Model for the Analysis of Large Spatial Datasets , 2015 .

[135]  Mario Bebendorf,et al.  Mathematik in den Naturwissenschaften Leipzig Existence of H-Matrix Approximants to the Inverse FE-Matrix of Elliptic Operators with L ∞-Coefficients , 2003 .

[136]  Robert Schaback,et al.  Approximation of eigenfunctions in kernel-based spaces , 2014, Adv. Comput. Math..

[137]  P. Guttorp,et al.  Studies in the history of probability and statistics XLIX On the Matérn correlation family , 2006 .

[138]  Joseph W. H. Liu,et al.  On Finding Supernodes for Sparse Matrix Computations , 1993, SIAM J. Matrix Anal. Appl..

[139]  Houman Owhadi,et al.  Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis , 2017, 1703.10761.

[140]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[141]  Pat Hanrahan,et al.  Wavelet radiosity , 1993, SIGGRAPH.

[142]  Helmut Harbrecht,et al.  Preconditioning of wavelet BEM by the incomplete Cholesky factorization , 2012, Comput. Vis. Sci..

[143]  Paris Perdikaris,et al.  Inferring solutions of differential equations using noisy multi-fidelity data , 2016, J. Comput. Phys..

[144]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[145]  Ralf Kornhuber,et al.  Numerical Homogenization of Elliptic Multiscale Problems by Subspace Decomposition , 2016, Multiscale Model. Simul..

[146]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[147]  J. Skilling Bayesian Solution of Ordinary Differential Equations , 1992 .

[148]  Stéphane Jaffard Propriétés des matrices « bien localisées » près de leur diagonale et quelques applications , 1990 .

[149]  Anton Schwaighofer,et al.  Transductive and Inductive Methods for Approximate Gaussian Process Regression , 2002, NIPS.

[150]  Daniel Peterseim,et al.  Numerical Homogenization of Heterogeneous Fractional Laplacians , 2017, Multiscale Model. Simul..