Modulating Scalable Gaussian Processes for Expressive Statistical Learning

For a learning task, Gaussian process (GP) is interested in learning the statistical relationship between inputs and outputs, since it offers not only the prediction mean but also the associated variability. The vanilla GP however struggles to learn complicated distribution with the property of, e.g., heteroscedastic noise, multi-modality and non-stationarity, from massive data due to the Gaussian marginal and the cubic complexity. To this end, this article studies new scalable GP paradigms including the non-stationary heteroscedastic GP, the mixture of GPs and the latent GP, which introduce additional latent variables to modulate the outputs or inputs in order to learn richer, non-Gaussian statistical representation. We further resort to different variational inference strategies to arrive at analytical or tighter evidence lower bounds (ELBOs) of the marginal likelihood for efficient and effective model training. Extensive numerical experiments against state-of-the-art GP and neural network (NN) counterparts on various tasks verify the superiority of these scalable modulated GPs, especially the scalable latent GP, for learning diverse data distributions.

[1]  Kevin Liu,et al.  Conditional Variational Autoencoder for Neural Machine Translation , 2018, ArXiv.

[2]  Melih Kandemir,et al.  Supervising topic models with Gaussian processes , 2018, Pattern Recognit..

[3]  Aníbal R. Figueiras-Vidal,et al.  Laplace Approximation for Divisive Gaussian Processes for Nonstationary Regression , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[5]  Edwin V. Bonilla,et al.  Fast Allocation of Gaussian Process Experts , 2014, ICML.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Jaewook Lee,et al.  Learning representative exemplars using one-class Gaussian process regression , 2018, Pattern Recognit..

[8]  Reza Ebrahimpour,et al.  Mixture of experts: a literature survey , 2014, Artificial Intelligence Review.

[9]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[10]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[11]  Ilias Bilionis,et al.  Multi-output local Gaussian process regression: Applications to uncertainty quantification , 2012, J. Comput. Phys..

[12]  Haitao Liu,et al.  Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression , 2018, ICML.

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[16]  Yew-Soon Ong,et al.  Deep Latent-Variable Kernel Learning , 2021, IEEE transactions on cybernetics.

[17]  Chao Yuan,et al.  Variational Mixture of Gaussian Process Experts , 2008, NIPS.

[18]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[19]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[20]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Shiliang Sun,et al.  Variational Inference for Infinite Mixtures of Gaussian Processes With Applications to Traffic Flow Prediction , 2011, IEEE Transactions on Intelligent Transportation Systems.

[22]  Shiliang Sun,et al.  Multi-View Representation Learning With Deep Gaussian Processes , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[24]  James Hensman,et al.  Deep Gaussian Processes with Importance-Weighted Variational Inference , 2019, ICML.

[25]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[26]  Carl E. Rasmussen,et al.  Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[27]  Stephen J. Roberts,et al.  GPz: non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts , 2016, 1604.03593.

[28]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.

[29]  Simon Osindero,et al.  An Alternative Infinite Mixture Of Gaussian Process Experts , 2005, NIPS.

[30]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[31]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[32]  David Duvenaud,et al.  Reinterpreting Importance-Weighted Autoencoders , 2017, ICLR.

[33]  Bernt Schiele,et al.  Conditional Flow Variational Autoencoders for Structured Sequence Prediction , 2019, ArXiv.

[34]  Jinwen Ma,et al.  An Efficient EM Approach to Parameter Learning of the Mixture of Gaussian Processes , 2011, ISNN.

[35]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[36]  Neil D. Lawrence,et al.  Variational Inference for Uncertainty on the Inputs of Gaussian Process Models , 2014, ArXiv.

[37]  Andrew Gordon Wilson,et al.  Constant-Time Predictive Distributions for Gaussian Processes , 2018, ICML.

[38]  Ryan P. Adams,et al.  Gaussian process product models for nonparametric nonstationarity , 2008, ICML '08.

[39]  Aníbal R. Figueiras-Vidal,et al.  Divisive Gaussian Processes for Nonstationary Regression , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[41]  Damiano Varagnolo,et al.  Distributed Multi-Agent Gaussian Regression via Finite-Dimensional Approximations , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[43]  Stephen Tyree,et al.  Exact Gaussian Processes on a Million Data Points , 2019, NeurIPS.

[44]  Radford M. Neal,et al.  Gaussian Process Regression with Heteroscedastic or Non-Gaussian Residuals , 2012, ArXiv.

[45]  Carl Henrik Ek,et al.  Data Association with Gaussian Processes , 2018, ECML/PKDD.

[46]  Ping Li,et al.  Hierarchical Gaussian Processes model for multi-task learning , 2018, Pattern Recognit..

[47]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[48]  S. Srihari Mixture Density Networks , 1994 .

[49]  Leslie Greengard,et al.  Fast Direct Methods for Gaussian Processes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  James Hensman,et al.  Gaussian Process Conditional Density Estimation , 2018, NeurIPS.

[51]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[52]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[53]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[54]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[55]  David J. Fleet,et al.  Efficient Optimization for Sparse Gaussian Process Regression , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Ping Li,et al.  Gaussian process approach for metric learning , 2019, Pattern Recognit..

[57]  Carl Henrik Ek,et al.  Latent Gaussian Process Regression , 2017, ArXiv.

[58]  Kieran R. Campbell,et al.  Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models , 2018, ICML.

[59]  Jianfei Cai,et al.  Large-Scale Heteroscedastic Regression via Gaussian Process , 2018, IEEE Transactions on Neural Networks and Learning Systems.