Hierarchical Inducing Point Gaussian Process for Inter-domain Observations

We examine the general problem of interdomain Gaussian Processes (GPs): problems where the GP realization and the noisy observations of that realization lie on different domains. When the mapping between those domains is linear, such as integration or differentiation, inference is still closed form. However, many of the scaling and approximation techniques that our community has developed do not apply to this setting. In this work, we introduce the hierarchical inducing point GP (HIP-GP), a scalable interdomain GP inference method that enables us to improve the approximation accuracy by increasing the number of inducing points to the millions. HIP-GP, which relies on inducing points with grid structure and a stationary kernel assumption, is suitable for lowdimensional problems. In developing HIPGP, we introduce (1) a fast whitening strategy, and (2) a novel preconditioner for conjugate gradients which can be helpful in general GP settings.

[1]  James Hensman,et al.  A Framework for Interdomain and Multioutput Gaussian Processes , 2020, ArXiv.

[2]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[3]  Dmitry Kropotov,et al.  Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition , 2017, AISTATS.

[4]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[5]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[6]  Andrew Gordon Wilson,et al.  Thoughts on Massively Scalable Gaussian Processes , 2015, ArXiv.

[7]  Adrian Wills,et al.  Probabilistic modelling and reconstruction of strain , 2018, Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms.

[8]  N. Cressie The origins of kriging , 1990 .

[9]  Y. X. Wang,et al.  Nuclear Instruments and Methods in Physics Research Section B : Beam Interactions with Materials and Atoms , 2018 .

[10]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[11]  Aki Vehtari,et al.  CORRECTING BOUNDARY OVER-EXPLORATION DEFICIENCIES IN BAYESIAN OPTIMIZATION WITH VIRTUAL DERIVATIVE SIGN OBSERVATIONS , 2017, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[12]  P. Hopkins,et al.  RECONCILING DWARF GALAXIES WITH ΛCDM COSMOLOGY: SIMULATING A REALISTIC POPULATION OF SATELLITES AROUND A MILKY WAY–MASS GALAXY , 2016, 1602.05957.

[13]  Aníbal R. Figueiras-Vidal,et al.  Inter-domain Gaussian Processes for Sparse Inference using Inducing Features , 2009, NIPS.

[14]  Thomas B. Schön,et al.  Evaluating the squared-exponential covariance function in Gaussian processes with integral observations , 2018, ArXiv.

[15]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[16]  John P. Cunningham,et al.  Fast Gaussian process methods for point process intensity estimation , 2008, ICML '08.

[17]  Roman Garnett,et al.  Bayesian optimization for sensor set selection , 2010, IPSN '10.

[18]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[19]  Anil Damle,et al.  Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization , 2020, NeurIPS.

[20]  Sarah Loebman,et al.  Synthetic Gaia DR3 surveys from the FIRE cosmological simulations of Milky-Way-mass galaxies , 2023, 2306.16475.

[21]  Stephen Tyree,et al.  Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[22]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[23]  Thomas B. Schön,et al.  On the construction of probabilistic Newton-type algorithms , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[24]  Michael A. Osborne,et al.  Preconditioning Kernel Matrices , 2016, ICML.

[25]  Raymond H. Chan,et al.  Conjugate Gradient Methods for Toeplitz Systems , 1996, SIAM Rev..

[26]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[27]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[28]  Prasanth B. Nair,et al.  Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF) , 2018, ICML.

[29]  Ryan P. Adams,et al.  Slice sampling covariance hyperparameters of latent Gaussian models , 2010, NIPS.

[30]  Paul Torrey,et al.  FIRE-2 simulations: physics versus numerics in galaxy formation , 2017, Monthly Notices of the Royal Astronomical Society.

[31]  Eugene Magnier,et al.  A THREE-DIMENSIONAL MAP OF MILKY WAY DUST , 2015, 1507.01005.

[32]  Aki Vehtari,et al.  Gaussian processes with monotonicity information , 2010, AISTATS.

[33]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[34]  James Hensman,et al.  Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.

[35]  T. Ensslin,et al.  Charting nearby dust clouds using Gaia data only (Corrigendum) , 2019, Astronomy & Astrophysics.

[36]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[37]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[38]  Andriy Mnih,et al.  Sparse Orthogonal Variational Inference for Gaussian Processes , 2020, AISTATS.

[39]  Carl E. Rasmussen,et al.  Understanding Probabilistic Sparse Gaussian Process Approximations , 2016, NIPS.

[40]  M. Fouesneau,et al.  Inferring the three-dimensional distribution of dust in the Galaxy with a non-parametric method: Preparing for Gaia , 2016, 1609.08917.

[41]  T. A. Lister,et al.  Gaia Data Release 2. Summary of the contents and survey properties , 2018, 1804.09365.