Chapter 7 – Unsupervised deep learning: A short review

Deep neural networks with several layers have recently become a highly successful and popular research topic in machine learning due to their excellent performance in many benchmark problems and applications. A key idea in deep learning is to learn not only the nonlinear mapping between the inputs and outputs but also the underlying structure of the data (input) vectors. In this chapter, we first consider problems with training deep networks using backpropagation-type algorithms. After this, we consider various structures used in deep learning, including restricted Boltzmann machines, deep belief networks, deep Boltzmann machines, and nonlinear autoencoders. In the latter part of this chapter, we discuss in more detail the recently developed neural autoregressive distribution estimator and its variants.

[1]  Sven Behnke,et al.  Two-layer contractive encodings for learning stable nonlinear features , 2015, Neural Networks.

[2]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[3]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[4]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[5]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[6]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[7]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[8]  Kyunghyun Cho,et al.  Foundations and Advances in Deep Learning , 2014 .

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[11]  Tapani Raiko,et al.  Enhanced Gradient for Training Restricted Boltzmann Machines , 2013, Neural Computation.

[12]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[13]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[14]  M.N.S. Swamy,et al.  Neural Networks and Statistical Learning , 2013 .

[15]  Yoshua Bengio,et al.  Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[16]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  Tapani Raiko,et al.  Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[19]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[20]  Tapani Raiko,et al.  Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines , 2011, ICANN.

[21]  Samy Bengio,et al.  Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[22]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[23]  Ruslan Salakhutdinov,et al.  Learning Deep Generative Models , 2009 .

[24]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[25]  Tapani Raiko,et al.  Iterative Neural Autoregressive Distribution Estimator NADE-k , 2014, NIPS.

[26]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[27]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[28]  F. Huang,et al.  Generalized Pseudo-Likelihood Estimates for Markov Random Fields on Lattice , 2002 .

[29]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[30]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[31]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[32]  Benjamin Schrauwen,et al.  Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..

[33]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[34]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .