暂无分享,去创建一个
[1] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[2] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[3] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[4] Di He,et al. A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems , 2019, ArXiv.
[5] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[6] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Jorge J. Moré,et al. The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .
[9] Jorge Nocedal,et al. Adaptive Sampling Strategies for Stochastic Optimization , 2017, SIAM J. Optim..
[10] Guillaume Hennequin,et al. Exact natural gradient in deep linear networks and its application to the nonlinear case , 2018, NeurIPS.
[11] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[12] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[13] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[14] David Barber,et al. Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.
[15] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[16] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[17] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[18] Ryan Kiros,et al. Training Neural Networks with Stochastic Hessian-Free Optimization , 2013, ICLR.
[19] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[20] Guodong Zhang,et al. Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks , 2019, NeurIPS.