On the importance of initialization and momentum in deep learning
暂无分享,去创建一个
Geoffrey E. Hinton | Ilya Sutskever | George E. Dahl | James Martens | Ilya Sutskever | James Martens | I. Sutskever
[1] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[2] John E. Moody,et al. Towards Faster Stochastic Gradient Search , 1991, NIPS.
[3] W. Wiegerinck,et al. Stochastic dynamics of learning with momentum in neural networks , 1994 .
[4] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[5] Genevieve Orr,et al. Dynamics and algorithms for stochastic search , 1996 .
[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[7] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.
[8] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[9] Harald Haas,et al. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.
[10] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[11] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[12] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[13] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[14] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[15] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[16] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[17] Ilya Sutskever,et al. SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .
[18] O. Chapelle. Improved Preconditioner for Hessian Free Optimization , 2011 .
[19] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[20] Ohad Shamir,et al. Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.
[21] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.
[22] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[23] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[24] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[25] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.
[26] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[27] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[28] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[29] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[30] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.