Large-Scale Heteroscedastic Regression via Gaussian Process

Heteroscedastic regression considering the varying noises among observations has many applications in the fields, such as machine learning and statistics. Here, we focus on the heteroscedastic Gaussian process (HGP) regression that integrates the latent function and the noise function in a unified nonparametric Bayesian framework. Though showing remarkable performance, HGP suffers from the cubic time complexity, which strictly limits its application to big data. To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale data sets. Furthermore, two variants are developed to improve the scalability and capability of VSHGP. The first is stochastic VSHGP (SVSHGP) that derives a factorized evidence lower bound, thus enhancing efficient stochastic variational inference. The second is distributed VSHGP (DVSHGP) that follows the Bayesian committee machine formalism to distribute computations over multiple local VSHGP experts with many inducing points and adopts hybrid parameters for experts to guard against overfitting and capture local variety. The superiority of DVSHGP and SVSHGP compared to the existing scalable HGP/homoscedastic GP is then extensively verified on various data sets.

[1]  Patrick van der Smagt,et al.  Sensor Calibration and Hysteresis Compensation With Heteroscedastic Gaussian Processes , 2015, IEEE Sensors Journal.

[2]  Haitao Liu,et al.  Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression , 2018, ICML.

[3]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[4]  Aníbal R. Figueiras-Vidal,et al.  Heteroscedastic Gaussian process regression using expectation propagation , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[5]  Stuart J. Russell,et al.  Gaussian Process Random Fields , 2015, NIPS.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Iain Murray,et al.  A framework for evaluating approximation methods for Gaussian process regression , 2012, J. Mach. Learn. Res..

[8]  Travis Fields,et al.  Heteroscedastic Gaussian Process-based System Identification and Predictive Control of a Quadcopter , 2018 .

[9]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[10]  Arno Solin,et al.  Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[11]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[12]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[13]  Kian Hsiang Low,et al.  Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception , 2017, Autonomous Robots.

[14]  James Hensman,et al.  Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.

[15]  Andreas Krause,et al.  Information Directed Sampling and Bandits with Heteroscedastic Noise , 2018, COLT.

[16]  Carl E. Rasmussen,et al.  Understanding Probabilistic Sparse Gaussian Process Approximations , 2016, NIPS.

[17]  Samuel Kaski,et al.  Non-Stationary Spectral Kernels , 2017, NIPS.

[18]  Aki Vehtari,et al.  Expectation propagation for nonstationary heteroscedastic Gaussian process regression , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[19]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[20]  M. Wand,et al.  Variational Inference for Heteroscedastic Semiparametric Regression , 2015 .

[21]  Shiliang Sun,et al.  A review of deterministic approximate inference techniques for Bayesian machine learning , 2013, Neural Computing and Applications.

[22]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[23]  Jianfei Cai,et al.  Understanding and Comparing Scalable Gaussian Process Regression for Big Data , 2018, Knowl. Based Syst..

[24]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[25]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[26]  Jarno Vanhatalo,et al.  Laplace approximation and the natural gradient for Gaussian process regression with the heteroscedastic Student-t model , 2017, 1712.07437.

[27]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[28]  Peng Kou,et al.  Probabilistic electricity price forecasting with variational heteroscedastic Gaussian process and active learning , 2015 .

[29]  Kian Hsiang Low,et al.  Stochastic Variational Inference for Bayesian Sparse Gaussian Process Regression , 2017, 2019 International Joint Conference on Neural Networks (IJCNN).

[30]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[31]  Aki Vehtari,et al.  Chained Gaussian Processes , 2016, AISTATS.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Maria Bauza,et al.  A probabilistic data-driven model for planar pushing , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[35]  Radford M. Neal,et al.  Gaussian Process Regression with Heteroscedastic or Non-Gaussian Residuals , 2012, ArXiv.

[36]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[37]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[38]  Gustavo Camps-Valls,et al.  Retrieval of Biophysical Parameters With Heteroscedastic Gaussian Processes , 2014, IEEE Geoscience and Remote Sensing Letters.

[39]  Stephen J. Roberts,et al.  GPz: non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts , 2016, 1604.03593.

[40]  José Miguel Hernández-Lobato,et al.  Variational Implicit Processes , 2018, ICML.

[41]  Jacob R. Gardner,et al.  Parametric Gaussian Process Regressors , 2020, ICML.

[42]  James Hensman,et al.  Gaussian Process Conditional Density Estimation , 2018, NeurIPS.

[43]  Constantinos Antoniou,et al.  A Metamodel for Estimating Error Bounds in Real-Time Traffic Prediction Systems , 2014, IEEE Transactions on Intelligent Transportation Systems.

[44]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[45]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[47]  Zoubin Ghahramani,et al.  Variable Noise and Dimensionality Reduction for Sparse Gaussian processes , 2006, UAI.

[48]  Neil D. Lawrence,et al.  Nested Variational Compression in Deep Gaussian Processes , 2014, 1412.1370.

[49]  Juho Rousu,et al.  Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo , 2015, AISTATS.

[50]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[51]  Dan Cornford,et al.  Parallel Geostatistics for Sparse and Dense Datasets , 2010 .

[52]  Christian S. Jensen,et al.  Building Accurate 3D Spatial Networks to Enable Next Generation Intelligent Transportation Systems , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[53]  Tao Chen,et al.  Bagging for Gaussian process regression , 2009, Neurocomputing.

[54]  Jacob R. Gardner,et al.  Parametric Gaussian Process Regressors , 2020, ICML.

[55]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[56]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[57]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[58]  Di Wu,et al.  A Two-Layer Mixture Model of Gaussian Process Functional Regressions and Its MCMC EM Algorithm , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[59]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[60]  Robert B. Gramacy,et al.  Practical Heteroscedastic Gaussian Process Modeling for Large Simulation Experiments , 2016, Journal of Computational and Graphical Statistics.

[61]  Yuan Qi,et al.  Asynchronous Distributed Variational Gaussian Process for Regression , 2017, ICML.

[62]  Kian Hsiang Low,et al.  A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data , 2015, ICML.

[63]  Richard E. Turner,et al.  Tree-structured Gaussian Process Approximations , 2014, NIPS.

[64]  Kian Hsiang Low,et al.  Stochastic Variational Inference for Fully Bayesian Sparse Gaussian Process Regression Models , 2017, ArXiv.

[65]  Haitao Liu,et al.  Remarks on multi-output Gaussian process regression , 2018, Knowl. Based Syst..

[66]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[67]  Kian Hsiang Low,et al.  A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models , 2016, ICML.

[68]  Aníbal R. Figueiras-Vidal,et al.  Divisive Gaussian Processes for Nonstationary Regression , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[69]  Kristian Kersting,et al.  Kernel Conditional Quantile Estimation via Reduction Revisited , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[70]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[71]  Aki Vehtari,et al.  Modelling local and global phenomena with sparse Gaussian processes , 2008, UAI.

[72]  Aníbal R. Figueiras-Vidal,et al.  Laplace Approximation for Divisive Gaussian Processes for Nonstationary Regression , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[74]  Carl E. Rasmussen,et al.  Healing the relevance vector machine through augmentation , 2005, ICML.

[75]  Edwin V. Bonilla,et al.  Fast Allocation of Gaussian Process Experts , 2014, ICML.