论文信息 - Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives

Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although domain knowledge can be used to help design representations, learning can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, manifold learning, and deep learning. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.

Pascal Vincent | Yoshua Bengio | Aaron C. Courville | Yoshua Bengio | Pascal Vincent

[1] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2] H. Hotelling. Analysis of a complex of statistical variables into principal components. , 1933 .

[3] D. Hubel,et al. Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[4] J. Besag. Statistical Analysis of Non-Lattice Data , 1975 .

[5] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6] Kunihiko Fukushima,et al. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[7] Johan Håstad,et al. Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[8] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[9] Jordan B. Pollack,et al. Recursive Distributed Representations , 1990, Artif. Intell..

[10] Christian Jutten,et al. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[11] Yann LeCun,et al. Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[12] T. Poggio,et al. Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[13] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[14] Radford M. Neal. Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[15] Geoffrey E. Hinton,et al. Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[16] Geoffrey E. Hinton,et al. Learning Mixture Models of Spatial Coherence , 1993, Neural Computation.

[17] Rich Caruana,et al. Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[18] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[19] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[20] Henry S. Baird,et al. Document image defect models , 1995 .

[21] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[22] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[23] Sam T. Roweis,et al. EM Algorithms for PCA and Sensible PCA , 1997, NIPS 1997.

[24] Alessandro Sperduti,et al. On the Efficient Classification of Data Structures by Neural Networks , 1997, IJCAI.

[25] Terrence J. Sejnowski,et al. The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[26] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[28] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[29] Alessandro Sperduti,et al. A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[30] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[31] L. Younes. On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[32] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[33] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[34] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[35] Aapo Hyvärinen,et al. Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[36] Aapo Hyvärinen,et al. Topographic Independent Component Analysis , 2001, Neural Computation.

[37] Geoffrey E. Hinton,et al. Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.

[38] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[39] Aapo Hyvärinen,et al. Temporal Coherence, Natural Image Sequences, and the Visual Cortex , 2002, NIPS.

[40] Pascal Vincent,et al. Manifold Parzen Windows , 2002, NIPS.

[41] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[42] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[43] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[44] Matthew Brand,et al. Charting a Manifold , 2002, NIPS.

[45] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[46] Nicolas Le Roux,et al. Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[47] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[48] D. Donoho,et al. Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[49] Konrad Paul Kording,et al. How are complex cell properties adapted to the statistics of natural stimuli? , 2004, Journal of neurophysiology.

[50] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[51] Alan L. Yuille,et al. The Convergence of Contrastive Divergences , 2004, NIPS.

[52] Kilian Q. Weinberger,et al. Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[53] Yoshua Bengio,et al. Non-Local Manifold Tangent Learning , 2004, NIPS.

[54] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[55] Lawrence Cayton,et al. Algorithms for manifold learning , 2005 .

[56] Laurenz Wiskott,et al. Slow feature analysis yields a rich repertoire of complex cell properties. , 2005, Journal of vision.

[57] Johan Håstad,et al. On the power of small-depth threshold circuits , 1991, computational complexity.

[58] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[59] Pascal Vincent,et al. Non-Local Manifold Parzen Windows , 2005, NIPS.

[60] Nicolas Le Roux,et al. The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[61] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.

[62] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[63] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[64] Daniel Marcu,et al. Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[65] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[66] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[67] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[68] Max Welling Donald,et al. Products of Experts , 2007 .

[69] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[70] Roger B. Grosse,et al. Shift-Invariance Sparse Coding for Audio Classification , 2007, UAI.

[71] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.

[72] Bruno A. Olshausen,et al. Learning Horizontal Connections in a Sparse Coding Model of Natural Images , 2007, NIPS.

[73] Marc'Aurelio Ranzato,et al. Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[74] Nicolas Le Roux,et al. Learning the 2-D Topology of Images , 2007, NIPS.

[75] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.

[76] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .

[77] Thomas Serre,et al. Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[79] Aapo Hyvärinen,et al. Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[80] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[81] Geoffrey E. Hinton,et al. The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[82] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[83] Ruslan Salakhutdinov,et al. Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[84] Geoffrey E. Hinton,et al. Generative versus discriminative training of RBMs for classification of fMRI images , 2008, NIPS.

[85] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[86] Bruno A. Olshausen,et al. Learning Transformational Invariants from Natural Movies , 2008, NIPS.

[87] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[88] Jason Weston,et al. Deep learning via semi-supervised embedding , 2008, ICML '08.

[89] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[90] Yoshua Bengio,et al. Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[91] H. Sebastian Seung,et al. Natural Image Denoising with Convolutional Networks , 2008, NIPS.

[92] David M. Bradley,et al. Differentiable Sparse Coding , 2008, NIPS.

[93] Aapo Hyvärinen,et al. Optimal Approximation of Signal Priors , 2008, Neural Computation.

[94] Geoffrey E. Hinton,et al. Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[95] Yoshua Bengio,et al. Slow, Decorrelated Features for Pretraining Complex Cell-like Networks , 2009, NIPS.

[96] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[97] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[98] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[99] Quoc V. Le,et al. Measuring Invariances in Deep Networks , 2009, NIPS.

[100] Geoffrey E. Hinton,et al. Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[101] Aapo Hyvärinen,et al. Natural Image Statistics - A Probabilistic Approach to Early Computational Vision , 2009, Computational Imaging and Vision.

[102] R. Fergus,et al. Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[103] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[104] Max Welling,et al. Herding Dynamic Weights for Partially Observed Random Field Models , 2009, UAI.

[105] Pascal Vincent,et al. Deep Learning using Robust Interdependent Codes , 2009, AISTATS.

[106] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[107] Yihong Gong,et al. Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[108] Yoshua Bengio,et al. Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[109] Laurens van der Maaten,et al. Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[110] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[111] Ruslan Salakhutdinov,et al. Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[112] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..

[113] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.

[114] Hugo Larochelle,et al. Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[115] Aaron C. Courville,et al. Understanding Representations Learned in Deep Architectures , 2010 .

[116] Quoc V. Le,et al. Tiled convolutional neural networks , 2010, NIPS.

[117] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[118] Geoffrey E. Hinton,et al. Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[119] Geoffrey E. Hinton. A Practical Guide to Training , 2010 .

[120] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[121] Yann LeCun,et al. Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[122] Ilya Sutskever,et al. On the Convergence Properties of Contrastive Divergence , 2010, AISTATS.

[123] Ruslan Salakhutdinov,et al. Learning Deep Boltzmann Machines using Adaptive MCMC , 2010, ICML.

[124] Yann LeCun,et al. Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[125] Tapani Raiko,et al. Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[126] Geoffrey E. Hinton,et al. Generating more realistic images using gated MRF's , 2010, NIPS.

[127] Kevin Swersky,et al. Inductive Principles for Learning Restricted Boltzmann Machines , 2010 .

[128] Christopher D. Manning,et al. Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[129] Pascal Vincent,et al. Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[130] Graham W. Taylor,et al. Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[131] Y-Lan Boureau,et al. Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[132] Geoffrey E. Hinton,et al. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[133] Yoshua Bengio,et al. DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS , 2010, Comput. Intell..

[134] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[135] Tong Zhang,et al. Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[136] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[137] Hariharan Narayanan,et al. Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.

[138] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[139] A. Krizhevsky. Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[140] Nando de Freitas,et al. Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[141] Marc'Aurelio Ranzato,et al. Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[142] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[143] Shenghuo Zhu,et al. Deep Coding Network , 2010, NIPS.

[144] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[145] Geoffrey E. Hinton,et al. Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[146] Geoffrey E. Hinton,et al. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[147] Derek C. Rose,et al. Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[148] Joseph F. Murray,et al. Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation , 2010, Neural Computation.

[149] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[150] Nando de Freitas,et al. Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood , 2011, UAI.

[151] Yann LeCun,et al. Structured sparse coding via lateral inhibition , 2011, NIPS.

[152] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[153] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[154] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[155] Andrew Y. Ng,et al. Selecting Receptive Fields in Deep Networks , 2011, NIPS.

[156] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[157] Tapani Raiko,et al. Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.

[158] Geoffrey E. Hinton,et al. Transforming Auto-Encoders , 2011, ICANN.

[159] Pascal Vincent,et al. Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.

[160] Yoshua Bengio,et al. On Tracking The Partition Function , 2011, NIPS.

[161] Yann LeCun,et al. Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[162] Razvan Pascanu,et al. Deep Learners Benefit More from Out-of-Distribution Examples , 2011, AISTATS.

[163] Quoc V. Le,et al. ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[164] Yoshua Bengio,et al. Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[165] Stéphane Mallat,et al. Group Invariant Scattering , 2011, ArXiv.

[166] Julien Mairal,et al. Structured sparsity through convex optimization , 2011, ArXiv.

[167] Katherine A. Heller,et al. Bayesian and L1 Approaches to Sparse Unsupervised Learning , 2011, ICML 2012.

[168] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[169] Yoshua Bengio,et al. Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[170] Andrew Y. Ng,et al. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[171] Yoshua Bengio,et al. A Spike and Slab Restricted Boltzmann Machine , 2011, AISTATS.

[172] Jörg Lücke,et al. A Closed-Form EM Algorithm for Sparse Coding , 2011 .

[173] Stéphane Mallat,et al. Classification with scattering operators , 2010, CVPR 2011.

[174] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[175] Pascal Vincent,et al. The Manifold Tangent Classifier , 2011, NIPS.

[176] Yoshua Bengio,et al. On the Expressive Power of Deep Architectures , 2011, ALT.

[177] Francis R. Bach,et al. Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[178] Nicolas Le Roux,et al. Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[179] Yoshua Bengio,et al. Large-Scale Learning of Embeddings with Reconstruction Sampling , 2011, ICML.

[180] Jiquan Ngiam,et al. Learning Deep Energy Models , 2011, ICML.

[181] Berin Martini,et al. Large-Scale FPGA-based Convolutional Networks , 2011 .

[182] John D. Lafferty,et al. Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[183] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[184] Miguel Lázaro-Gredilla,et al. Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[185] Pascal Vincent,et al. Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.

[186] Will Y. Zou. Unsupervised learning of visual invariance with temporal coherence , 2011 .

[187] Ronan Collobert,et al. Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[188] Geoffrey E. Hinton,et al. On deep generative models with applications to recognition , 2011, CVPR 2011.

[189] Rémi Gribonval,et al. Should Penalized Least Squares Regression be Interpreted as Maximum A Posteriori Estimation? , 2011, IEEE Transactions on Signal Processing.

[190] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[191] James J. DiCarlo,et al. How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[192] Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[193] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.

[194] Yoshua Bengio,et al. On Training Deep Boltzmann Machines , 2012, ArXiv.

[195] Kilian Q. Weinberger,et al. Marginalized Stacked Denoising Autoencoders , 2012 .

[196] Yoshua Bengio,et al. Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.

[197] Yoshua Bengio,et al. Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.

[198] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[199] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[200] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[201] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[202] Christopher K. I. Williams,et al. Multiple Texture Boltzmann Machines , 2012, AISTATS.

[203] Léon Bottou,et al. From machine learning to machine reasoning , 2011, Machine Learning.