J un 2 01 3 Deep Learning of Representations : Looking Forward

Deep learning research aims at discovering learning algorithms that discover multiple levels of distributed representations, with higher levels representing more abstract concepts. Although the study of deep learning has already led to impressive theoretical results, learning algorithms and breakthrough experiments, several challenges lie ahead. This paper proposes to examine some of these challenges, centering on the questions of scaling deep learning algorithms to much larger models and datasets, reducing optimization difficulties due to ill-conditioning or local minima, designing more efficient and powerful inference and sampling procedures, and learning to disentangle the factors of variation underlying the observed data. It also proposes a few forward-looking research directions aimed at overcoming these

[1]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[2]  H. Sebastian Seung,et al.  Learning Continuous Attractors in Recurrent Networks , 1997, NIPS.

[3]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[4]  Geoffrey E. Hinton Products of experts , 1999 .

[5]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[6]  Sven Behnke,et al.  Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid , 2001, Int. J. Comput. Intell. Appl..

[7]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[8]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  Yoshua Bengio,et al.  Nonlocal Estimation of Manifold Structure , 2006, Neural Computation.

[11]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[12]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[13]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[14]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[15]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[16]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[17]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[18]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[19]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[20]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[21]  A. Hyvärinen,et al.  Estimation of Non-normalized Statistical Models , 2009 .

[22]  Yann LeCun,et al.  Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[23]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[24]  Geoffrey E. Hinton,et al.  Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.

[25]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[26]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[27]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[28]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[29]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[30]  Pascal Vincent,et al.  Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.

[31]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[32]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[33]  F. Savard Réseaux de neurones à relaxation entraînés par critère d'autoencodeur débruitant , 2012 .

[34]  Yoshua Bengio,et al.  A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[37]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons , 2013, ArXiv.

[38]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[39]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[40]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[41]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[42]  Kratarth Goel,et al.  Modeling temporal dependencies in data using a DBN-LSTM , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).