Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent
暂无分享,去创建一个
[1] Kenneth Levenberg. A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .
[2] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .
[3] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .
[4] P. J. Werbos,et al. Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.
[5] Sharad Singhal,et al. Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.
[6] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .
[7] William H. Press,et al. Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .
[8] M. F. Møller,et al. Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time , 1993 .
[9] Todd K. Leen,et al. Optimal Stochastic Search and Adaptive Momentum , 1993, NIPS.
[10] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[11] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[12] Terrence J. Sejnowski,et al. Tempering Backpropagation Networks: Not All Weights are Created Equal , 1995, NIPS.
[13] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[14] Manfred K. Warmuth,et al. Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.
[15] Manfred K. Warmuth,et al. Worst-case Loss Bounds for Single Neurons , 1995, NIPS.
[16] Todd K. Leen,et al. Using Curvature Information for Fast Stochastic Search , 1996, NIPS.
[17] Mance E. Harmon,et al. Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .
[18] Mark Harmon. Multi-player residual advantage learning with general function , 1996 .
[19] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[20] Shun-ichi Amari,et al. Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons , 1998, Neural Computation.
[21] Nicol N. Schraudolph. Online Learning with Adaptive Local Step Sizes , 1999 .
[22] M. Rattray,et al. Incorporating curvature information into on-line learning , 1999 .
[23] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[24] Nicol N. Schraudolph,et al. Online Independent Component Analysis with Local Learning Rate Adaptation , 1999, NIPS.
[25] M. Rattray,et al. MATRIX MOMENTUM FOR PRACTICAL NATURAL GRADIENT LEARNING , 1999 .
[26] Gavin C. Cawley,et al. On a Fast, Compact Approximation of the Exponential Function , 2000, Neural Computation.
[27] Motoaki Kawanabe,et al. On-line learning in changing environments with applications in supervised and unsupervised learning , 2002, Neural Networks.
[28] W. Press,et al. Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .
[29] Nicol N. Schraudolph,et al. Gradient-based manipulation of nonparametric entropy estimates , 2004, IEEE Transactions on Neural Networks.
[30] S. Amari. Natural Gradient Works Eciently in Learning , 2022 .