论文信息 - Efficient Deep Gaussian Process Models for Variable-Sized Inputs - 字舞流文

Efficient Deep Gaussian Process Models for Variable-Sized Inputs

Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such as trees, graphs, and sequences. We introduce the GP-DRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP’s latent vectors and infer the posterior of DRF’s internal weights and random frequencies. Our experiments show that GP-DRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GP-DRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment.

Vladimir Pavlovic | Mark W. Schmidt | Mark Schmidt | Minyoung Kim | Issam H. Laradji | Minyoung Kim | V. Pavlovic | I. Laradji

[1] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.

[2] Pavel Kuksa,et al. Scalable kernel methods and algorithms for general sequence analysis , 2011 .

[3] Carl E. Rasmussen,et al. A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[4] Tim J. P. Hubbard,et al. SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[5] Neil D. Lawrence,et al. Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.

[6] P. Mahalanobis. On the generalized distance in statistics , 1936 .

[7] Edwin V. Bonilla,et al. Scalable Inference for Gaussian Process Models with Black-Box Likelihoods , 2015, NIPS.

[8] Tao Li,et al. A comparative study on content-based music genre classification , 2003, SIGIR.

[9] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[10] Daniel Hernández-Lobato,et al. Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[11] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.

[12] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[13] Neil D. Lawrence,et al. Deep Gaussian Processes , 2012, AISTATS.

[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15] Fred C. Schweppe,et al. State Space Evaluation of the Bhattacharyya Distance between Two Gaussian Processes , 1967, Inf. Control..

[16] Alessandro Moschitti,et al. Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[17] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18] James Hensman,et al. Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[19] Yogesh Rathi,et al. Image Segmentation Using Active Contours Driven by the Bhattacharyya Gradient Flow , 2007, IEEE Transactions on Image Processing.

[20] Imdadullah Khan,et al. Efficient Approximation Algorithms for Strings Kernel Based Sequence Classification , 2017, NIPS.

[21] Alán Aspuru-Guzik,et al. Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[22] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .

[23] Maurizio Filippone,et al. Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[24] W. Rudin,et al. Fourier Analysis on Groups. , 1965 .

[25] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[26] Chulhee Lee,et al. Feature extraction based on the Bhattacharyya distance , 2003, Pattern Recognit..