Random Feature Expansions for Deep Gaussian Processes

The composition of multiple Gaussian Processes as a Deep Gaussian Process (DGP) enables a deep probabilistic nonparametric approach to flexibly tackle complex machine learning problems with sound quantification of uncertainty. Existing inference approaches for DGP models have limited scalability and are notoriously cumbersome to construct. In this work we introduce a novel formulation of DGPs based on random feature expansions that we train using stochastic variational inference. This yields a practical learning framework which significantly advances the state-of-the-art in inference for DGPs, and enables accurate quantification of uncertainty. We extensively showcase the scalability and performance of our proposal on several datasets with up to 8 million observations, and various DGP architectures with up to 30 hidden layers.

[1]  Yali Wang,et al.  Sequential Inference for Deep Gaussian Process , 2016, AISTATS.

[2]  Alexander J. Smola,et al.  Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.

[3]  Supplementary Material Random Feature Expansions for Deep Gaussian Processes , 2017 .

[4]  J. Sopena,et al.  Neural networks with periodic and monotonic activation functions: a comparative study in classification problems , 1999 .

[5]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[6]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[7]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[10]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[11]  Maurizio Filippone,et al.  AutoGP: Exploring the Capabilities and Limitations of Gaussian Process Models , 2016, UAI.

[12]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[14]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[15]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[16]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[17]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[22]  Neil D. Lawrence,et al.  Nested Variational Compression in Deep Gaussian Processes , 2014, 1412.1370.

[23]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24]  Samy Bengio,et al.  Revisiting Distributed Synchronous SGD , 2016, ArXiv.

[25]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[26]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[27]  Richard E. Turner,et al.  Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs , 2015, ICML.

[28]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[29]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[30]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[31]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[32]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[35]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[36]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[37]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[38]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[39]  Alexis Boukouvalas,et al.  GPflow: A Gaussian Process Library using TensorFlow , 2016, J. Mach. Learn. Res..

[40]  Ryan P. Adams,et al.  Avoiding pathologies in very deep networks , 2014, AISTATS.

[41]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[42]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[43]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[44]  Neil D. Lawrence,et al.  Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.