Deep Latent-Variable Kernel Learning

Deep kernel learning (DKL) leverages the connection between Gaussian process (GP) and neural networks (NN) to build an end-to-end, hybrid model. It combines the capability of NN to learn rich representations under massive data and the non-parametric property of GP to achieve automatic regularization that incorporates a trade-off between model fit and model complexity. However, the deterministic encoder may weaken the model regularization of the following GP part, especially on small datasets, due to the free latent representation. We therefore present a complete deep latent-variable kernel learning (DLVKL) model wherein the latent variables perform stochastic encoding for regularized representation. We further enhance the DLVKL from two aspects: (i) the expressive variational posterior through neural stochastic differential equation (NSDE) to improve the approximation quality, and (ii) the hybrid prior taking knowledge from both the SDE prior and the posterior to arrive at a flexible trade-off. Intensive experiments imply that the DLVKL-NSDE performs similarly to the well calibrated GP on small datasets, and outperforms existing deep GPs on large datasets.

[1]  Yew-Soon Ong,et al.  Evolutionary Optimization of Expensive Multiobjective Problems With Co-Sub-Pareto Front Gaussian Process Surrogates , 2019, IEEE Transactions on Cybernetics.

[2]  Samuel Kaski,et al.  Deep learning with differential Gaussian process flows , 2018, AISTATS.

[3]  Richard E. Turner,et al.  Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints , 2015 .

[4]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[5]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Yuan Qi,et al.  Asynchronous Distributed Variational Gaussian Process for Regression , 2017, ICML.

[10]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[11]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[12]  Andrew Gordon Wilson,et al.  Product Kernel Interpolation for Scalable Gaussian Processes , 2018, AISTATS.

[13]  Carl E. Rasmussen,et al.  Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models , 2014, NIPS.

[14]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[15]  Haitao Liu,et al.  Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression , 2018, ICML.

[16]  Ole Winther,et al.  How to Train Deep Variational Autoencoders and Probabilistic Ladder Networks , 2016, ICML 2016.

[17]  Carl E. Rasmussen,et al.  Convolutional Gaussian Processes , 2017, NIPS.

[18]  Haitao Liu,et al.  Remarks on multi-output Gaussian process regression , 2018, Knowl. Based Syst..

[19]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[20]  Ryan P. Adams,et al.  Avoiding pathologies in very deep networks , 2014, AISTATS.

[21]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[24]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[25]  Richard E. Turner,et al.  Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.

[26]  Lawrence Carin,et al.  Continuous-Time Flows for Efficient Inference and Density Estimation , 2017, ICML.

[27]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[28]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[29]  Benjamin Schrauwen,et al.  Recurrent Kernel Machines: Computing with Infinite Echo State Networks , 2012, Neural Computation.

[30]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[31]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[32]  Ping Li,et al.  Shared Gaussian Process Latent Variable Model for Incomplete Multiview Clustering , 2020, IEEE Transactions on Cybernetics.

[33]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[34]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[35]  Chuan Li,et al.  Spectrum-Based Kernel Length Estimation for Gaussian Process Classification , 2014, IEEE Transactions on Cybernetics.

[36]  Andrew Gordon Wilson,et al.  Learning Scalable Deep Kernels with Recurrent Structure , 2016, J. Mach. Learn. Res..

[37]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[38]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[39]  Maurizio Filippone,et al.  Calibrating Deep Convolutional Gaussian Processes , 2018, AISTATS.

[40]  Muhammad Sahimi,et al.  Approaching complexity by stochastic methods: From biological systems to turbulence , 2011 .

[41]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Cho-Jui Hsieh,et al.  Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise , 2019, ArXiv.

[43]  Carl E. Rasmussen,et al.  Variational Gaussian Process State-Space Models , 2014, NIPS.

[44]  Haitao Liu,et al.  Scalable Gaussian Process Classification with Additive Noise for Various Likelihoods , 2019, ArXiv.

[45]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[46]  Neil D. Lawrence,et al.  Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes , 2016, J. Mach. Learn. Res..

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[49]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[50]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[51]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[52]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.

[53]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[54]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[55]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[56]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.