论文信息 - Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent

Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent

We propose a generic method for iteratively approximating various second-order gradient steps-Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient-in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techniques for on-line learning, matrix momentum and stochastic meta-descent (SMD), implement this approach. Since both were originally derived by very different routes, this offers fresh insight into their operation, resulting in further improvements to SMD.

Nicol N. Schraudolph | N. Schraudolph

[1] Kenneth Levenberg. A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[2] D. Marquardt. An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[3] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .

[4] P. J. Werbos,et al. Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[5] Sharad Singhal,et al. Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[6] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[7] William H. Press,et al. Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[8] M. F. Møller,et al. Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time , 1993 .

[9] Todd K. Leen,et al. Optimal Stochastic Search and Adaptive Momentum , 1993, NIPS.

[10] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[11] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.