A Sparse Expansion For Deep Gaussian Processes

Deep Gaussian Processes (DGP) enable a non-parametric approach to quantify the uncertainty of complex deep machine learning models. Conventional inferential methods for DGP models can suffer from high computational complexity as they require large-scale operations with kernel matrices for training and inference. In this work, we propose an efficient scheme for accurate inference and prediction based on a range of Gaussian Processes, called the Tensor Markov Gaussian Processes (TMGP). We construct an induced approximation of TMGP referred to as the hierarchical expansion. Next, we develop a deep TMGP (DTMGP) model as the composition of multiple hierarchical expansion of TMGPs. The proposed DTMGP model has the following properties: (1) the outputs of each activation function are deterministic while the weights are chosen independently from standard Gaussian distribution; (2) in training or prediction, only O(polylog(M)) (out of M) activation functions have non-zero outputs, which significantly boosts the computational efficiency. Our numerical experiments on real datasets show the superior computational efficiency of DTMGP versus other DGP models.

[1]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[2]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[3]  Robert B. Gramacy,et al.  Active Learning for Deep Gaussian Process Surrogates , 2020, Technometrics.

[4]  Diego Klabjan,et al.  Bayesian active learning for choice models with deep Gaussian processes , 2018, ArXiv.

[5]  Robert P. W. Duin,et al.  Feedforward neural networks with random weights , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[6]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[7]  David Barber,et al.  Variational Cumulant Expansions for Intractable Distributions , 2011, J. Artif. Intell. Res..

[8]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[9]  Tomasz Kozlowski,et al.  Surrogate modeling of advanced computer simulations using deep Gaussian processes , 2020, Reliab. Eng. Syst. Saf..

[10]  Juan José Murillo-Fuentes,et al.  Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo , 2018, NeurIPS.

[11]  Maurizio Filippone,et al.  Random Feature Expansions for Deep Gaussian Processes , 2016, ICML.

[12]  C. F. Jeff Wu,et al.  On Prediction Properties of Kriging: Uniform Error Bounds and Robustness , 2017, Journal of the American Statistical Association.

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  H. Bungartz,et al.  Sparse grids , 2004, Acta Numerica.

[15]  Matthew Plumlee,et al.  Fast Prediction of Deterministic Functions Using Sparse Grid Experimental Designs , 2014, 1402.6350.

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[18]  Neil D. Lawrence,et al.  Nested Variational Compression in Deep Gaussian Processes , 2014, 1412.1370.

[19]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[20]  G. Matheron Principles of geostatistics , 1963 .

[21]  Heeyoung Kim,et al.  Deep Gaussian process models for integrating multifidelity experiments with nonstationary relationships , 2021, IISE Transactions.

[22]  Neil D. Lawrence,et al.  Approximating Posterior Distributions in Belief Networks Using Mixtures , 1997, NIPS.

[23]  Loïc Brevault,et al.  Bayesian optimization using deep Gaussian processes with applications to aerospace system design , 2021, Optimization and Engineering.

[24]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[25]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[26]  Richard D. Neidinger,et al.  Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming , 2010, SIAM Rev..

[27]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[28]  Shiliang Sun,et al.  Active Learning Methods with Deep Gaussian Processes , 2018, ICONIP.

[29]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[30]  Neil D. Lawrence,et al.  Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.

[31]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[32]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[33]  Klaus Ritter,et al.  Average-case analysis of numerical problems , 2000, Lecture notes in mathematics.

[34]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[35]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[36]  M. Marcus,et al.  Markov Processes, Gaussian Processes, and Local Times: Markov processes and local times , 2006 .

[37]  Jerome Sacks,et al.  Designs for Computer Experiments , 1989 .

[38]  Shahin Shahrampour,et al.  Generalization Guarantees for Sparse Kernel Approximation with Entropic Optimal Features , 2020, ICML.

[39]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[40]  Thomas Lukasiewicz,et al.  Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records , 2020, Scientific Reports.