Stochastic Variational Deep Kernel Learning

Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training. Specifically, we apply additive base kernels to subsets of output features from deep neural architectures, and jointly learn the parameters of the base kernels and deep network through a Gaussian process marginal likelihood objective. Within this framework, we derive an efficient form of stochastic variational inference which leverages local kernel interpolation, inducing points, and structure exploiting algebra. We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and ImageNet.

[1]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[2]  Carl E. Rasmussen,et al.  Occam's Razor , 2000, NIPS.

[3]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[4]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[5]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[8]  David Ginsbourger,et al.  Additive Kernels for Gaussian Process Modeling , 2011, 1103.4023.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Andrew Gordon Wilson,et al.  Gaussian Process Regression Networks , 2011, ICML.

[13]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[14]  Alexander J. Smola,et al.  Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.

[15]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[16]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[17]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  James Hensman,et al.  MCMC for Variationally Sparse Gaussian Processes , 2015, NIPS.

[22]  Andrew Gordon Wilson,et al.  Thoughts on Massively Scalable Gaussian Processes , 2015, ArXiv.

[23]  Edwin V. Bonilla,et al.  Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[24]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[25]  Michael A. Osborne,et al.  Blitzkriging: Kronecker-structured Stochastic Gaussian Processes , 2015, 1510.07965.

[26]  Richard E. Turner,et al.  Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints , 2015 .

[27]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[28]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[29]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[30]  Neil D. Lawrence,et al.  Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.

[31]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).