Deep Latent-Variable Kernel Learning

Deep kernel learning (DKL) leverages the connection between the Gaussian process (GP) and neural networks (NNs) to build an end-to-end hybrid model. It combines the capability of NN to learn rich representations under massive data and the nonparametric property of GP to achieve automatic regularization that incorporates a tradeoff between model fit and model complexity. However, the deterministic NN encoder may weaken the model regularization of the following GP part, especially on small datasets, due to the free latent representation. We, therefore, present a complete deep latent-variable kernel learning (DLVKL) model wherein the latent variables perform stochastic encoding for regularized representation. We further enhance the DLVKL from two aspects: 1) the expressive variational posterior through neural stochastic differential equation (NSDE) to improve the approximation quality and 2) the hybrid prior taking knowledge from both the SDE prior and the posterior to arrive at a flexible tradeoff. Extensive experiments imply that DLVKL-NSDE performs similar to the well-calibrated GP on small datasets, and shows superiority on large datasets.

[1]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[2]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[3]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[4]  Carl E. Rasmussen,et al.  Convolutional Gaussian Processes , 2017, NIPS.

[5]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[6]  Neil D. Lawrence,et al.  Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes , 2016, J. Mach. Learn. Res..

[7]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[8]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[9]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[10]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[11]  Ryan P. Adams,et al.  Avoiding pathologies in very deep networks , 2014, AISTATS.

[12]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[13]  Benjamin Schrauwen,et al.  Recurrent Kernel Machines: Computing with Infinite Echo State Networks , 2012, Neural Computation.

[14]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[15]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[16]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[17]  Cho-Jui Hsieh,et al.  Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise , 2019, ArXiv.

[18]  Richard E. Turner,et al.  Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints , 2015 .

[19]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[20]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[21]  Haitao Liu,et al.  Remarks on multi-output Gaussian process regression , 2018, Knowl. Based Syst..

[22]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Andrew Gordon Wilson,et al.  Product Kernel Interpolation for Scalable Gaussian Processes , 2018, AISTATS.

[25]  Yuan Qi,et al.  Asynchronous Distributed Variational Gaussian Process for Regression , 2017, ICML.

[26]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[27]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Chuan Li,et al.  Spectrum-Based Kernel Length Estimation for Gaussian Process Classification , 2014, IEEE Transactions on Cybernetics.

[30]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[31]  Haitao Liu,et al.  Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression , 2018, ICML.

[32]  Richard E. Turner,et al.  Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.

[33]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[34]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[35]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[36]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Haitao Liu,et al.  Scalable Gaussian Process Classification with Additive Noise for Various Likelihoods , 2019, ArXiv.

[38]  Carl E. Rasmussen,et al.  Variational Gaussian Process State-Space Models , 2014, NIPS.

[39]  Ping Li,et al.  Shared Gaussian Process Latent Variable Model for Incomplete Multiview Clustering , 2020, IEEE Transactions on Cybernetics.

[40]  Maurizio Filippone,et al.  Calibrating Deep Convolutional Gaussian Processes , 2018, AISTATS.

[41]  Yew-Soon Ong,et al.  Evolutionary Optimization of Expensive Multiobjective Problems With Co-Sub-Pareto Front Gaussian Process Surrogates , 2019, IEEE Transactions on Cybernetics.

[42]  Muhammad Sahimi,et al.  Approaching complexity by stochastic methods: From biological systems to turbulence , 2011 .

[43]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[44]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[45]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[46]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[47]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[49]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[50]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[51]  Ole Winther,et al.  How to Train Deep Variational Autoencoders and Probabilistic Ladder Networks , 2016, ICML 2016.

[52]  Samuel Kaski,et al.  Deep learning with differential Gaussian process flows , 2018, AISTATS.

[53]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[54]  Andrew Gordon Wilson,et al.  Learning Scalable Deep Kernels with Recurrent Structure , 2016, J. Mach. Learn. Res..

[55]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[56]  Lawrence Carin,et al.  Continuous-Time Flows for Efficient Inference and Density Estimation , 2017, ICML.