Practical Recommendations for Gradient-Based Training of Deep Architectures
暂无分享,去创建一个
[1] Geoffrey E. Hinton. Relaxation and its role in vision , 1977 .
[2] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[3] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[4] Johan Håstad,et al. Almost optimal lower bounds for small depth circuits , 1986, STOC '86.
[5] Yann LeCun,et al. Generalization and network design strategies , 1989 .
[6] Geoffrey E. Hinton,et al. Learning distributed representations of concepts. , 1989 .
[7] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..
[8] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[9] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..
[10] Jordan B. Pollack,et al. Recursive Distributed Representations , 1990, Artif. Intell..
[11] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[12] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[13] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.
[14] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[15] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.
[16] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.
[17] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[18] Emile H. L. Aarts,et al. Boltzmann machines , 1998 .
[19] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[20] Alessandro Sperduti,et al. A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.
[21] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.
[22] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[23] Laurenz Wiskott,et al. Applying Slow Feature Analysis to Image Sequences Yields a Rich Repertoire of Complex Cell Properties , 2002, ICANN.
[24] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.
[25] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.
[26] Jonathan Baxter,et al. A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.
[27] Ronald,et al. Learning representations by backpropagating errors , 2004 .
[28] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[29] Samy Bengio,et al. Links between perceptrons, MLPs and SVMs , 2004, ICML.
[30] Johan Håstad,et al. On the power of small-depth threshold circuits , 1991, computational complexity.
[31] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.
[32] Nicolas Le Roux,et al. The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.
[33] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[34] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[35] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.
[36] Matthew Richardson,et al. Markov logic networks , 2006, Machine Learning.
[37] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.
[38] H. Robbins. A Stochastic Approximation Method , 1951 .
[39] Marc'Aurelio Ranzato,et al. Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.
[40] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[41] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[42] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .
[43] Andrew McCallum,et al. Introduction to Statistical Relational Learning , 2007 .
[44] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[45] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[46] Jason Weston,et al. Deep learning via semi-supervised embedding , 2008, ICML '08.
[47] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.
[48] Yoshua Bengio,et al. Neural net language models , 2008, Scholarpedia.
[49] Yoshua Bengio,et al. Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.
[50] David M. Bradley,et al. Differentiable Sparse Coding , 2008, NIPS.
[51] Geoffrey E. Hinton,et al. Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.
[52] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..
[53] Frank Hutter,et al. Automated configuration of algorithms for solving hard computational problems , 2009 .
[54] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[55] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[56] Quoc V. Le,et al. Measuring Invariances in Deep Networks , 2009, NIPS.
[57] R. Fergus,et al. Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[58] Ashwin Srinivasan,et al. Parameter Screening and Optimisation for ILP using Designed Experiments , 2011, J. Mach. Learn. Res..
[59] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[60] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.
[61] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[62] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[63] David D. Cox,et al. A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..
[64] Kai A. Krueger,et al. Flexible shaping: How learning in small steps helps , 2009, Cognition.
[65] Aaron C. Courville,et al. Understanding Representations Learned in Deep Architectures , 2010 .
[66] Quoc V. Le,et al. Tiled convolutional neural networks , 2010, NIPS.
[67] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[68] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[69] Nicolas Le Roux,et al. Improving First and Second-Order Methods by Modeling Uncertainty , 2010 .
[70] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[71] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[72] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.
[73] Christopher D. Manning,et al. Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .
[74] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[75] VincentPascal,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010 .
[76] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[77] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..
[78] Nando de Freitas,et al. A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets , 2010, 2010 Information Theory and Applications Workshop (ITA).
[79] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.
[80] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[81] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.
[82] Tapani Raiko,et al. Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines , 2011, ICANN.
[83] Tapani Raiko,et al. Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines , 2011, ICML.
[84] Jason Weston,et al. Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.
[85] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.
[86] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[87] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.
[88] Yoshua Bengio,et al. Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.
[89] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[90] Yoshua Bengio,et al. Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.
[91] Andrew Y. Ng,et al. The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.
[92] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.
[93] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[94] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.
[95] Pascal Vincent,et al. The Manifold Tangent Classifier , 2011, NIPS.
[96] Yoshua Bengio,et al. On the Expressive Power of Deep Architectures , 2011, ALT.
[97] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.
[98] Yoshua Bengio,et al. Large-Scale Learning of Embeddings with Reconstruction Sampling , 2011, ICML.
[99] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[100] Pascal Vincent,et al. Quickly Generating Representative Samples from an RBM-Derived Process , 2011, Neural Computation.
[101] Will Y. Zou. Unsupervised learning of visual invariance with temporal coherence , 2011 .
[102] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[103] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.
[104] Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.
[105] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[106] Yoshua Bengio,et al. Unsupervised and Transfer Learning Challenge: a Deep Learning Approach , 2011, ICML Unsupervised and Transfer Learning.
[107] Nicol N. Schraudolph,et al. Centering Neural Network Gradient Factors , 1996, Neural Networks: Tricks of the Trade.
[108] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[109] Yoshua Bengio,et al. Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.
[110] Yoshua Bengio,et al. A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.
[111] Klaus-Robert Müller,et al. Deep Boltzmann Machines as Feed-Forward Hierarchies , 2012, AISTATS.
[112] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[113] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[114] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.
[115] Yoshua Bengio,et al. Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders , 2012, ArXiv.
[116] Jason Weston,et al. Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.
[117] Léon Bottou,et al. From machine learning to machine reasoning , 2011, Machine Learning.
[118] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.