Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Gaussian processes (GPs) are a powerful tool for probabilistic inference over functions. They have been applied to both regression and non-linear dimensionality reduction, and offer desirable properties such as uncertainty estimates, robustness to over-fitting, and principled ways for tuning hyper-parameters. However the scalability of these models to big datasets remains an active topic of research. We introduce a novel re-parametrisation of variational inference for sparse GP regression and latent variable models that allows for an efficient distributed algorithm. This is done by exploiting the decoupling of the data given the inducing points to re-formulate the evidence lower bound in a Map-Reduce setting. We show that the inference scales well with data and computational resources, while preserving a balanced distribution of the load among the nodes. We further demonstrate the utility in scaling Gaussian processes to big data. We show that GP performance improves with increasing amounts of data in regression (on flight data with 2 million records) and latent variable modelling (on MNIST). The results show that GPs perform better than many common models often used for big data.

[1]  Arjun K. Gupta The Theory of Linear Models and Multivariate Analysis , 1981 .

[2]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Darren J. Wilkinson,et al.  Parallel Bayesian Computation , 2005 .

[5]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[6]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[7]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[8]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[9]  Anthony Brockwell Parallel Markov chain Monte Carlo Simulation by Pre-Fetching , 2006 .

[10]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.

[11]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[12]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[13]  Phil Blunsom,et al.  A Systematic Bayesian Treatment of the IBM Alignment Models , 2013, HLT-NAACL.

[14]  Eric P. Xing,et al.  Parallel Markov Chain Monte Carlo for Nonparametric Mixture Models , 2013, ICML.

[15]  Ryan P. Adams,et al.  ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures , 2013, ArXiv.

[16]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[17]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[18]  Zoubin Ghahramani,et al.  Pitfalls in the use of Parallel Inference for the Dirichlet Process , 2014, ICML.