Equilibrated adaptive learning rates for non-convex optimization
暂无分享,去创建一个
[1] A. Sluis. Condition numbers and equilibration of matrices , 1969 .
[2] Charles R. Johnson,et al. A Simple Estimate of the Condition Number of a Linear System , 1995 .
[3] B. Datta. Numerical Linear Algebra and Applications , 1995 .
[4] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[5] Y. Saad,et al. An estimator for the diagonal of a matrix , 2007 .
[6] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[7] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[8] Biswa Nath Datta. Numerical Linear Algebra and Applications, Second Edition , 2010 .
[9] W. Murray,et al. Matrix-Free Approximate Equilibration , 2011, 1110.2805.
[10] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[11] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[12] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[13] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[14] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.
[15] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[16] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[17] Tom Schaul,et al. Unit Tests for Stochastic Optimization , 2013, ICLR.
[18] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[19] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[20] Volkan Cevher,et al. Stochastic Spectral Descent for Restricted Boltzmann Machines , 2015, AISTATS.