Representation Learning: A Review and New Perspectives

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[4]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[7]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[8]  Y. L. Cun Learning Process in an Asymmetric Threshold Network , 1986 .

[9]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[10]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[11]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[12]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[13]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[14]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[16]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[17]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[18]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[19]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[20]  T. Poggio,et al.  Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[21]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[22]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[23]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[24]  Yoshua Bengio A Connectionist Approach to Speech Recognition , 1993, Int. J. Pattern Recognit. Artif. Intell..

[25]  Geoffrey E. Hinton,et al.  Learning Mixture Models of Spatial Coherence , 1993, Neural Computation.

[26]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[27]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[28]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[29]  Henry S. Baird,et al.  Document image defect models , 1995 .

[30]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[31]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[32]  Sam T. Roweis,et al.  EM Algorithms for PCA and Sensible PCA , 1997, NIPS 1997.

[33]  H. Sebastian Seung,et al.  Learning Continuous Attractors in Recurrent Networks , 1997, NIPS.

[34]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[35]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[36]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[37]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[38]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[39]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[40]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[41]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[42]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[43]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[44]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[45]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[46]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[47]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[48]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[49]  Geoffrey E. Hinton,et al.  Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.

[50]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[51]  Aapo Hyvärinen,et al.  Temporal Coherence, Natural Image Sequences, and the Visual Cortex , 2002, NIPS.

[52]  Pascal Vincent,et al.  Manifold Parzen Windows , 2002, NIPS.

[53]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[54]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[55]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[56]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[57]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[58]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[60]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[61]  D. Donoho,et al.  Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[62]  Konrad Paul Kording,et al.  How are complex cell properties adapted to the statistics of natural stimuli? , 2004, Journal of neurophysiology.

[63]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[64]  Alan L. Yuille,et al.  The Convergence of Contrastive Divergences , 2004, NIPS.

[65]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[66]  Yoshua Bengio,et al.  Non-Local Manifold Tangent Learning , 2004, NIPS.

[67]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[68]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[69]  Laurenz Wiskott,et al.  Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.

[70]  Johan Håstad,et al.  On the power of small-depth threshold circuits , 1991, computational complexity.

[71]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[72]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[73]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[74]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[75]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[76]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[77]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[78]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[79]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[80]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[81]  Yee Whye Teh,et al.  Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation , 2006, Cogn. Sci..

[82]  Max Welling Donald,et al.  Products of Experts , 2007 .

[83]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[84]  Roger B. Grosse,et al.  Shift-Invariance Sparse Coding for Audio Classification , 2007, UAI.

[85]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[86]  Bruno A. Olshausen,et al.  Learning Horizontal Connections in a Sparse Coding Model of Natural Images , 2007, NIPS.

[87]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[88]  Nicolas Le Roux,et al.  Learning the 2-D Topology of Images , 2007, NIPS.

[89]  Nicolas Le Roux,et al.  Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.

[90]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[91]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[93]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[94]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[95]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[96]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[97]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[98]  Geoffrey E. Hinton,et al.  Generative versus discriminative training of RBMs for classification of fMRI images , 2008, NIPS.

[99]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[100]  Bruno A. Olshausen,et al.  Learning Transformational Invariants from Natural Movies , 2008, NIPS.

[101]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[102]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[103]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[104]  Yoshua Bengio,et al.  Neural net language models , 2008, Scholarpedia.

[105]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[106]  H. Sebastian Seung,et al.  Natural Image Denoising with Convolutional Networks , 2008, NIPS.

[107]  David M. Bradley,et al.  Differentiable Sparse Coding , 2008, NIPS.

[108]  Aapo Hyvärinen,et al.  Optimal Approximation of Signal Priors , 2008, Neural Computation.

[109]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[110]  Yoshua Bengio,et al.  Slow, Decorrelated Features for Pretraining Complex Cell-like Networks , 2009, NIPS.

[111]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[112]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[113]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[114]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[115]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[116]  Aapo Hyvärinen,et al.  Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[117]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[118]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[119]  Max Welling,et al.  Herding Dynamic Weights for Partially Observed Random Field Models , 2009, UAI.

[120]  Pascal Vincent,et al.  Deep Learning using Robust Interdependent Codes , 2009, AISTATS.

[121]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[122]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[123]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[124]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[125]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[126]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[127]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[128]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[129]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[130]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[131]  Aaron C. Courville,et al.  Understanding Representations Learned in Deep Architectures , 2010 .

[132]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[133]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[134]  Dong Yu,et al.  Sequential Labeling Using Deep-Structured Conditional Random Fields , 2010, IEEE Journal of Selected Topics in Signal Processing.

[135]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[136]  Yann LeCun,et al.  Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields , 2010, ArXiv.

[137]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[138]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[139]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[140]  J. Andrew Bagnell,et al.  Boosted Backpropagation Learning for Training Deep Modular Networks , 2010, ICML.

[141]  Ilya Sutskever,et al.  On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.

[142]  Ruslan Salakhutdinov,et al.  Learning Deep Boltzmann Machines using Adaptive MCMC , 2010, ICML.

[143]  Yann LeCun,et al.  Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[144]  Tapani Raiko,et al.  Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[145]  Geoffrey E. Hinton,et al.  Generating more realistic images using gated MRF's , 2010, NIPS.

[146]  Kevin Swersky,et al.  Inductive Principles for Learning Restricted Boltzmann Machines , 2010 .

[147]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[148]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[149]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[150]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[151]  Yoshua Bengio,et al.  DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS , 2010, Comput. Intell..

[152]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[153]  Tong Zhang,et al.  Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[154]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[155]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[156]  Hariharan Narayanan,et al.  Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.

[157]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[158]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[159]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[160]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[161]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[162]  Shenghuo Zhu,et al.  Deep Coding Network , 2010, NIPS.

[163]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[164]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[165]  Geoffrey E. Hinton,et al.  Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.

[166]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[167]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[168]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[169]  Joseph F. Murray,et al.  Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation , 2010, Neural Computation.

[170]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[171]  Nando de Freitas,et al.  Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood , 2011, UAI.

[172]  Yann LeCun,et al.  Structured sparse coding via lateral inhibition , 2011, NIPS.

[173]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[174]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[175]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[176]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[177]  Andrew Y. Ng,et al.  Selecting Receptive Fields in Deep Networks , 2011, NIPS.

[178]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[179]  Tapani Raiko,et al.  Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.

[180]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[181]  Pascal Vincent,et al.  Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.

[182]  Yoshua Bengio,et al.  On Tracking The Partition Function , 2011, NIPS.

[183]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[184]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[185]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[186]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[187]  Mohamed Chtourou,et al.  On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[188]  Yoshua Bengio,et al.  Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[189]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[190]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[191]  Katherine A. Heller,et al.  Bayesian and L1 Approaches to Sparse Unsupervised Learning , 2011, ICML 2012.

[192]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[193]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[194]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[195]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[196]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[197]  Yoshua Bengio,et al.  A Spike and Slab Restricted Boltzmann Machine , 2011, AISTATS.

[198]  Geoffrey E. Hinton,et al.  Transforming Autoencoders , 2011 .

[199]  Jörg Lücke,et al.  A Closed-Form EM Algorithm for Sparse Coding , 2011 .

[200]  Stéphane Mallat,et al.  Classification with scattering operators , 2010, CVPR 2011.

[201]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[202]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[203]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[204]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[205]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[206]  Yoshua Bengio,et al.  Large-Scale Learning of Embeddings with Reconstruction Sampling , 2011, ICML.

[207]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[208]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[209]  Berin Martini,et al.  Large-Scale FPGA-based Convolutional Networks , 2011 .

[210]  John D. Lafferty,et al.  Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[211]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[212]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[213]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[214]  Pascal Vincent,et al.  Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.

[215]  Will Y. Zou Unsupervised learning of visual invariance with temporal coherence , 2011 .

[216]  Douglas Eck,et al.  Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio , 2011, ISMIR.

[217]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[218]  Rémi Gribonval,et al.  Should Penalized Least Squares Regression be Interpreted as Maximum A Posteriori Estimation? , 2011, IEEE Transactions on Signal Processing.

[219]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[220]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[221]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[222]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[223]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[224]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[225]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[226]  F. Savard Réseaux de neurones à relaxation entraînés par critère d'autoencodeur débruitant , 2012 .

[227]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[228]  Yoshua Bengio,et al.  On Training Deep Boltzmann Machines , 2012, ArXiv.

[229]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[230]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[231]  Yoshua Bengio,et al.  Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[232]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[233]  Jörg Lücke,et al.  Closed-Form EM for Sparse Coding and Its Application to Source Separation , 2011, LVA/ICA.

[234]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[235]  Yoshua Bengio,et al.  Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.

[236]  Yoshua Bengio,et al.  A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.

[237]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[238]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[239]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[240]  Tapani Raiko,et al.  Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[241]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[242]  Holger Schwenk,et al.  Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[243]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[244]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[245]  Yoshua Bengio,et al.  Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders , 2012, ArXiv.

[246]  Jason Weston,et al.  Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[247]  Christopher K. I. Williams,et al.  Multiple Texture Boltzmann Machines , 2012, AISTATS.

[248]  Alexandre Allauzen,et al.  Structured Output Layer Neural Network Language Models for Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[249]  Razvan Pascanu,et al.  Natural Gradient Revisited , 2013, ICLR.

[250]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[251]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[252]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[253]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[254]  M. Yüksel,et al.  A Ph.D. Thesis , 2014 .

[255]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[256]  Jason Morton,et al.  When Does a Mixture of Products Contain a Product of Mixtures? , 2012, SIAM J. Discret. Math..