暂无分享,去创建一个
[1] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[2] Sham M. Kakade,et al. Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.
[3] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.
[4] Richard H. Bartels,et al. Algorithm 432 [C2]: Solution of the matrix equation AX + XB = C [F4] , 1972, Commun. ACM.
[5] Kenji Fukumizu,et al. Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.
[6] O. Chapelle. Improved Preconditioner for Hessian Free Optimization , 2011 .
[7] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[8] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[9] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[10] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[11] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.
[12] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[13] Noboru Murata,et al. A Statistical Study on On-line Learning , 1999 .
[14] Te-son Kuo,et al. Trace bounds on the solution of the algebraic matrix Riccati and Lyapunov equation , 1986 .
[15] John Moody,et al. Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[18] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[19] N. Komaroff. Upper summation and product bounds for solution eigenvalues of the Lyapunov matrix equation , 1992 .
[20] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[21] N. Komaroff. Simultaneous eigenvalue lower bounds for the Lyapunov matrix equation , 1988 .
[22] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[23] Francis R. Bach,et al. From Averaging to Acceleration, There is Only a Step-size , 2015, COLT.
[24] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.
[25] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[26] Ruslan Salakhutdinov,et al. Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix , 2015, ICML.
[27] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.
[28] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[29] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.
[30] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.
[31] Francis R. Bach,et al. Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2014, AISTATS 2014.
[32] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[33] Young Soo Moon,et al. Bounds in algebraic Riccati and Lyapunov equations: a survey and some new results , 1996 .
[34] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[35] Jorge J. Moré,et al. The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .
[36] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[37] Shun-ichi Amari,et al. Adaptive blind signal processing-neural network approaches , 1998, Proc. IEEE.
[38] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[39] Andrew W. Fitzgibbon,et al. A fast natural Newton method , 2010, ICML.
[40] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[42] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[43] Léon Bottou,et al. On-line learning for very large data sets , 2005 .