Towards Bayesian Deep Learning: A Survey

While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a general introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this survey, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks.

[1]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[2]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[3]  Ye Wang,et al.  Improving Content-based and Hybrid Music Recommendation using Deep Learning , 2014, ACM Multimedia.

[4]  Dit-Yan Yeung,et al.  Relational Stacked Denoising Autoencoder for Tag Recommendation , 2015, AAAI.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Takamitsu Matsubara,et al.  Latent Kullback Leibler Control for Continuous-State Systems using Probabilistic Graphical Models , 2014, UAI.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[9]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[10]  Wu-Jun Li,et al.  Relational Collaborative Topic Regression for Recommender Systems , 2015, IEEE Transactions on Knowledge and Data Engineering.

[11]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[12]  Hao Wang,et al.  Bayesian deep learning for integrated intelligence : bridging the gap between perception and inference , 2017 .

[13]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[16]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[17]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[21]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[22]  Changsheng Xu,et al.  Cross-Space Affinity Learning with Its Application to Movie Recommendation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[23]  R. Strichartz A Guide to Distribution Theory and Fourier Transforms , 1994 .

[24]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[25]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[26]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[27]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[28]  Zhe Gan,et al.  Scalable Deep Poisson Factor Analysis for Topic Modeling , 2015, ICML.

[29]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[31]  Dit-Yan Yeung,et al.  Natural-Parameter Networks: A Class of Probabilistic Neural Networks , 2016, NIPS.

[32]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[33]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[34]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[35]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[36]  Yu Zhang,et al.  Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.

[37]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[38]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[39]  Sheng Li,et al.  Deep Collaborative Filtering via Marginalized Denoising Auto-encoder , 2015, CIKM.

[40]  Hongwei Liu,et al.  Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC , 2017, ICML.

[41]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[42]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[43]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Lawrence Carin,et al.  Negative Binomial Process Count and Mixture Modeling , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[47]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[48]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[49]  Dit-Yan Yeung,et al.  Relational Deep Learning: A Deep Latent Variable Model for Link Prediction , 2017, AAAI.

[50]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[51]  Zhe Gan,et al.  Learning Deep Sigmoid Belief Networks with Data Augmentation , 2015, AISTATS.

[52]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[53]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[54]  James She,et al.  Collaborative Variational Autoencoder for Recommender Systems , 2017, KDD.

[55]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[56]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[57]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[58]  Nicholas R. Jennings,et al.  Learning users' interests by quality classification in market-based recommender systems , 2005, IEEE Transactions on Knowledge and Data Engineering.

[59]  Yoon-Joo Park,et al.  The Adaptive Clustering Method for the Long Tail Problem of Recommender Systems , 2013, IEEE Transactions on Knowledge and Data Engineering.

[60]  Tara N. Sainath,et al.  Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[61]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[62]  Preslav Nakov,et al.  A non-IID Framework for Collaborative Filtering with Restricted Boltzmann Machines , 2013, ICML.

[63]  Yann LeCun PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[64]  Ah-Hwee Tan,et al.  Discovering and Exploiting Causal Dependencies for Robust Mobile Context-Aware Recommenders , 2007, IEEE Transactions on Knowledge and Data Engineering.

[65]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[66]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[67]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[68]  Wu-Jun Li,et al.  Relation regularized matrix factorization , 2009, IJCAI 2009.

[69]  Mark J. F. Gales,et al.  Product of Gaussians for speech recognition , 2006, Comput. Speech Lang..

[70]  Liang Chen,et al.  Collaborative Deep Ranking: A Hybrid Pair-Wise Recommendation Algorithm with Implicit Feedback , 2016, PAKDD.

[71]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[72]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[73]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[74]  Mark F. Hornick,et al.  Extending Recommender Systems for Disjoint User/Item Sets: The Conference Recommendation Problem , 2012, IEEE Transactions on Knowledge and Data Engineering.

[75]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[76]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[77]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[78]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[79]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[80]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[81]  Lawrence Carin,et al.  Electronic Health Record Analysis via Deep Poisson Factor Models , 2016, J. Mach. Learn. Res..

[82]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[83]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[84]  Yan Liu,et al.  Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems , 2012, ICML.

[85]  Wu-Jun Li,et al.  Collaborative Topic Regression with Social Regularization for Tag Recommendation , 2013, IJCAI.

[86]  Yg,et al.  Dropout as a Bayesian Approximation : Insights and Applications , 2015 .

[87]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[88]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[89]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[90]  Yoshua Bengio,et al.  Marginalized Denoising Auto-encoders for Nonlinear Representations , 2014, ICML.

[91]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[92]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[93]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[94]  David B. Dunson,et al.  Beta-Negative Binomial Process and Poisson Factor Analysis , 2011, AISTATS.