A Survey on Bayesian Deep Learning

A comprehensive artificial intelligence system needs to not only perceive the environment with different “senses” (e.g., seeing and hearing) but also infer the world’s conditional (or even causal) relations and corresponding uncertainty. The past decade has seen major advances in many perception tasks, such as visual object recognition and speech recognition, using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. In recent years, Bayesian deep learning has emerged as a unified probabilistic framework to tightly integrate deep learning and Bayesian models.1 In this general framework, the perception of text or images using deep learning can boost the performance of higher-level inference and, in turn, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a comprehensive introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, control, and so on. We also discuss the relationship and differences between Bayesian deep learning and other related topics, such as Bayesian treatment of neural networks.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Nicholas Jing Yuan,et al.  Collaborative Knowledge Base Embedding for Recommender Systems , 2016, KDD.

[3]  Lawrence Carin,et al.  Electronic Health Record Analysis via Deep Poisson Factor Models , 2016, J. Mach. Learn. Res..

[4]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[5]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[6]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Shuming Shi,et al.  QuaSE: Sequence Editing under Quantifiable Guidance , 2018, EMNLP.

[8]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[9]  WangChong,et al.  Stochastic variational inference , 2013 .

[10]  Hao He,et al.  Bidirectional Inference Networks: A Class of Deep Bayesian Networks for Health Profiling , 2019, AAAI.

[11]  J. Doob Stochastic processes , 1953 .

[12]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Ye Wang,et al.  Deep Graph Random Process for Relational-Thinking-Based Speech Recognition , 2020, ICML.

[15]  Dit-Yan Yeung,et al.  Relational Deep Learning: A Deep Latent Variable Model for Link Prediction , 2017, AAAI.

[16]  Stefano Ermon,et al.  Graphite: Iterative Generative Modeling of Graphs , 2018, ICML.

[17]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[18]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[19]  James She,et al.  Collaborative Variational Autoencoder for Recommender Systems , 2017, KDD.

[20]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[21]  Gediminas Adomavicius,et al.  Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques , 2012, IEEE Transactions on Knowledge and Data Engineering.

[22]  Yg,et al.  Dropout as a Bayesian Approximation : Insights and Applications , 2015 .

[23]  Federico Tombari,et al.  Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[25]  VincentPascal,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010 .

[26]  Mark F. Hornick,et al.  Extending Recommender Systems for Disjoint User/Item Sets: The Conference Recommendation Problem , 2012, IEEE Transactions on Knowledge and Data Engineering.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[29]  Zhe Gan,et al.  Scalable Deep Poisson Factor Analysis for Topic Modeling , 2015, ICML.

[30]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[31]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[32]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[33]  Ye Wang,et al.  Improving Content-based and Hybrid Music Recommendation using Deep Learning , 2014, ACM Multimedia.

[34]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[37]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[38]  Ali Farhadi,et al.  Asynchronous Temporal Fields for Action Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[41]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[42]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[43]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[44]  Wu-Jun Li,et al.  Relation regularized matrix factorization , 2009, IJCAI 2009.

[45]  Xing Xie,et al.  Collaborative Filtering Meets Mobile Recommendation: A User-Centered Approach , 2010, AAAI.

[46]  Stephan Günnemann,et al.  Intensity-Free Learning of Temporal Point Processes , 2020, ICLR.

[47]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[48]  Tim Januschowski,et al.  Deep Factors for Forecasting , 2019, ICML.

[49]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[50]  Heiga Zen,et al.  Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.

[51]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[52]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[53]  David B. Dunson,et al.  Beta-Negative Binomial Process and Poisson Factor Analysis , 2011, AISTATS.

[54]  Xing Xie,et al.  Content-Based Collaborative Filtering for News Topic Recommendation , 2015, AAAI.

[55]  Syama Sundar Rangapuram,et al.  Probabilistic Forecasting with Spline Quantile Function RNNs , 2019, AISTATS.

[56]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[57]  Dit-Yan Yeung,et al.  Natural-Parameter Networks: A Class of Probabilistic Neural Networks , 2016, NIPS.

[58]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[59]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[60]  Tommi S. Jaakkola,et al.  Sequence to Better Sequence: Continuous Revision of Combinatorial Structures , 2017, ICML.

[61]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[62]  Wu-Jun Li,et al.  Relational Collaborative Topic Regression for Recommender Systems , 2015, IEEE Transactions on Knowledge and Data Engineering.

[63]  Hao He,et al.  ProbGAN: Towards Probabilistic GAN with Theoretical Guarantees , 2018, ICLR.

[64]  Mark J. F. Gales,et al.  Product of Gaussians for speech recognition , 2006, Comput. Speech Lang..

[65]  Kristian Kersting,et al.  Faster Attend-Infer-Repeat with Tractable Probabilistic Models , 2019, ICML.

[66]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[67]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[68]  Dit-Yan Yeung,et al.  Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks , 2016, NIPS.

[69]  Dit-Yan Yeung,et al.  Relational Stacked Denoising Autoencoder for Tag Recommendation , 2015, AAAI.

[70]  Liang Chen,et al.  Collaborative Deep Ranking: A Hybrid Pair-Wise Recommendation Algorithm with Implicit Feedback , 2016, PAKDD.

[71]  Takamitsu Matsubara,et al.  Latent Kullback Leibler Control for Continuous-State Systems using Probabilistic Graphical Models , 2014, UAI.

[72]  Lior Rokach,et al.  Recommender Systems Handbook , 2010 .

[73]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[74]  Lawrence Carin,et al.  Stochastic Blockmodels meet Graph Neural Networks , 2019, ICML.

[75]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[76]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[77]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[78]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[79]  Yu Zhang,et al.  Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.

[80]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[81]  Tara N. Sainath,et al.  Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[82]  Hao Wang,et al.  Recurrent Poisson Process Unit for Speech Recognition , 2019, AAAI.

[83]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[84]  Preslav Nakov,et al.  A non-IID Framework for Collaborative Filtering with Restricted Boltzmann Machines , 2013, ICML.

[85]  Yoshua Bengio,et al.  Marginalized Denoising Auto-encoders for Nonlinear Representations , 2014, ICML.

[86]  Lan Du,et al.  Dirichlet belief networks for topic structure learning , 2018, NeurIPS.

[87]  Andrew Gordon Wilson,et al.  The Case for Bayesian Deep Learning , 2020, ArXiv.

[88]  Chao Liu,et al.  Wisdom of the better few: cold start recommendation via representative based rating elicitation , 2011, RecSys '11.

[89]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[90]  R. Strichartz A Guide to Distribution Theory and Fourier Transforms , 1994 .

[91]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[92]  Boris Flach,et al.  Feed-forward Propagation in Probabilistic Neural Networks with Categorical and Max Layers , 2018, ICLR.

[93]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[94]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[95]  Sheng Li,et al.  Deep Collaborative Filtering via Marginalized Denoising Auto-encoder , 2015, CIKM.

[96]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[97]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[98]  Yan Liu,et al.  Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems , 2012, ICML.

[99]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[100]  James R. Glass,et al.  Scalable Factorized Hierarchical Variational Autoencoder Training , 2018, INTERSPEECH.

[101]  Wu-Jun Li,et al.  Collaborative Topic Regression with Social Regularization for Tag Recommendation , 2013, IJCAI.

[102]  Yoon-Joo Park,et al.  The Adaptive Clustering Method for the Long Tail Problem of Recommender Systems , 2013, IEEE Transactions on Knowledge and Data Engineering.

[103]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[104]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[105]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[106]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[107]  Furong Huang,et al.  Sampling-Free Learning of Bayesian Quantized Neural Networks , 2019, ICLR.

[108]  Nicholas R. Jennings,et al.  Learning users' interests by quality classification in market-based recommender systems , 2005, IEEE Transactions on Knowledge and Data Engineering.

[109]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[110]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[111]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[112]  Joseph A. Konstan,et al.  Introduction to recommender systems , 2008, SIGMOD Conference.

[113]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[114]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[115]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[116]  Juan-Zi Li,et al.  Typicality-Based Collaborative Filtering Recommendation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[117]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[118]  Changsheng Xu,et al.  Cross-Space Affinity Learning with Its Application to Movie Recommendation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[119]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[120]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[121]  Hongwei Liu,et al.  Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC , 2017, ICML.

[122]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[123]  Lawrence Carin,et al.  Negative Binomial Process Count and Mixture Modeling , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[124]  Stefan Roth,et al.  Lightweight Probabilistic Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[125]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[126]  Matthias W. Seeger,et al.  Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.

[127]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[128]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[129]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[130]  Dimitris Papadias,et al.  Collaborative Filtering with Personalized Skylines , 2011, IEEE Transactions on Knowledge and Data Engineering.

[131]  Kazuyuki Aihara,et al.  Fully Neural Network based Model for General Temporal Point Processes , 2019, NeurIPS.

[132]  Duy Nguyen-Tuong,et al.  Probabilistic Recurrent State-Space Models , 2018, ICML.

[133]  David M. Blei,et al.  Black Box FDR , 2018, ICML.

[134]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[135]  Hao Wang,et al.  Bayesian deep learning for integrated intelligence : bridging the gap between perception and inference , 2017 .

[136]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[137]  Zhe Gan,et al.  Learning Deep Sigmoid Belief Networks with Data Augmentation , 2015, AISTATS.

[138]  Yann LeCun PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[139]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[140]  Ah-Hwee Tan,et al.  Discovering and Exploiting Causal Dependencies for Robust Mobile Context-Aware Recommenders , 2007, IEEE Transactions on Knowledge and Data Engineering.

[141]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .