Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression

In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts. The state-of-the-art aggregation models, however, either provide inconsistent predictions or require time-consuming aggregation process. We first prove the inconsistency of typical aggregations using disjoint or random data partition, and then present a consistent yet efficient aggregation model for large-scale GP. The proposed model inherits the advantages of aggregations, e.g., closed-form inference and aggregation, parallelization and distributed computing. Furthermore, theoretical and empirical analyses reveal that the new aggregation model performs better due to the consistent predictions that converge to the true underlying function when the training size approaches infinity.

[1]  Yu Ding,et al.  Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets , 2011, J. Mach. Learn. Res..

[2]  François Bachoc,et al.  Nested Kriging predictions for datasets with a large number of observations , 2016, Statistics and Computing.

[3]  David J. Fleet,et al.  Generalized Product of Experts for Automatic and Principled Fusion of Gaussian Process Predictions , 2014, ArXiv.

[4]  M. Schervish,et al.  Posterior Consistency in Nonparametric Regression Problems under Gaussian Process Priors , 2004 .

[5]  Carl E. Rasmussen,et al.  Understanding Probabilistic Sparse Gaussian Process Approximations , 2016, NIPS.

[6]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[7]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[8]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[9]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[10]  Stuart J. Russell,et al.  Gaussian Process Random Fields , 2015, NIPS.

[11]  T. Gneiting,et al.  Combining probability forecasts , 2010 .

[12]  Chao Yuan,et al.  Variational Mixture of Gaussian Process Experts , 2008, NIPS.

[13]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[14]  Iain Murray,et al.  A framework for evaluating approximation methods for Gaussian process regression , 2012, J. Mach. Learn. Res..

[15]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[16]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[17]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[18]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[19]  Zoubin Ghahramani,et al.  Local and global sparse Gaussian process approximations , 2007, AISTATS.

[20]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[21]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[22]  LawrenceNeil Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005 .

[23]  Yuan Qi,et al.  Asynchronous Distributed Variational Gaussian Process for Regression , 2017, ICML.

[24]  Haitao Liu,et al.  Remarks on multi-output Gaussian process regression , 2018, Knowl. Based Syst..

[25]  Kian Hsiang Low,et al.  A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models , 2016, ICML.

[26]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[27]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[28]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[29]  Haitao Liu,et al.  An adaptive sampling approach for Kriging metamodeling by maximizing expected prediction error , 2017, Comput. Chem. Eng..

[30]  Radu Cristian Ionescu,et al.  Revisiting Large Scale Distributed Machine Learning , 2015, ArXiv.

[31]  Julien Bect,et al.  Pointwise consistency of the kriging predictor with known mean and covariance functions , 2009, 0912.1479.

[32]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[33]  Richard E. Turner,et al.  Tree-structured Gaussian Process Approximations , 2014, NIPS.

[34]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[35]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[36]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[37]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[38]  Seyed Abolfazl Motahari,et al.  Learning of Gaussian Processes in Distributed and Communication Limited Systems , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.