A robust adaptive stochastic gradient method for deep learning
暂无分享,去创建一个
Yoshua Bengio | Çaglar Gülçehre | Marcin Moczulski | Jose Sotelo | Yoshua Bengio | Çaglar Gülçehre | Marcin Moczulski | Jose M. R. Sotelo
[1] Xi Chen,et al. Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.
[2] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[3] Yuri Levin,et al. Directional Newton methods in n variables , 2002, Math. Comput..
[4] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[5] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[6] Yoshua Bengio,et al. Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.
[7] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[8] Barak A. Pearlmutter,et al. Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.
[9] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[10] Tom Schaul,et al. Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients , 2013, ICLR.
[11] Z. Bai,et al. Directional secant method for nonlinear equations , 2005 .
[12] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[15] H. Robbins. A Stochastic Approximation Method , 1951 .
[16] Marcus Liwicki,et al. IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).
[17] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[19] Yoshua Bengio,et al. ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient , 2014, ArXiv.
[20] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[21] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[22] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[23] Razvan Pascanu,et al. M L ] 2 0 A ug 2 01 3 Pylearn 2 : a machine learning research library , 2014 .
[24] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[25] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[26] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[27] Ilya Sutskever,et al. SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .