Efficient Deep Gaussian Process Models for Variable-Sized Inputs

Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such as trees, graphs, and sequences. We introduce the GP-DRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP’s latent vectors and infer the posterior of DRF’s internal weights and random frequencies. Our experiments show that GP-DRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GP-DRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment.

[1]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[2]  Pavel Kuksa,et al.  Scalable kernel methods and algorithms for general sequence analysis , 2011 .

[3]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[4]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[5]  Neil D. Lawrence,et al.  Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.

[6]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[7]  Edwin V. Bonilla,et al.  Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[8]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[11]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[12]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[13]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Fred C. Schweppe,et al.  State Space Evaluation of the Bhattacharyya Distance between Two Gaussian Processes , 1967, Inf. Control..

[16]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[19]  Yogesh Rathi,et al.  Image Segmentation Using Active Contours Driven by the Bhattacharyya Gradient Flow , 2007, IEEE Transactions on Image Processing.

[20]  Imdadullah Khan,et al.  Efficient Approximation Algorithms for Strings Kernel Based Sequence Classification , 2017, NIPS.

[21]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[22]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[23]  Maurizio Filippone,et al.  Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[24]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[25]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[26]  Chulhee Lee,et al.  Feature extraction based on the Bhattacharyya distance , 2003, Pattern Recognit..