Using Curvature Information for Fast Stochastic Search

We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate. The algorithm makes effective use of curvature information, requires only O(n) storage and computation, and delivers convergence rates close to the theoretical optimum. We demonstrate the technique on linear and large nonlinear backprop networks.

[1]  J. H. Venter An extension of the Robbins-Monro procedure , 1967 .

[2]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[3]  M. Tugay,et al.  Properties of the momentum LMS algorithm , 1989, Proceedings. Electrotechnical Conference Integrating Research, Industry and Education in Energy and Communication Engineering',.

[4]  John J. Shynk,et al.  Analysis of the momentum LMS algorithm , 1990, IEEE Trans. Acoust. Speech Signal Process..

[5]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[6]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[7]  Todd K. Leen,et al.  Weight Space Probability Densities in Stochastic Learning: II. Transients and Basin Hopping Times , 1992, NIPS.

[8]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization in Large Adaptive Machines , 1992, NIPS.

[9]  J. Moody,et al.  Learning rate schedules for stochastic gradient algorithms , 1993 .

[10]  Todd K. Leen,et al.  Optimal Stochastic Search and Adaptive Momentum , 1993, NIPS.

[11]  William Finnoff,et al.  Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistance to Local Minima , 1992, Neural Computation.

[12]  W. Wiegerinck,et al.  Stochastic dynamics of learning with momentum in neural networks , 1994 .

[13]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[14]  Genevieve Orr,et al.  Dynamics and algorithms for stochastic search , 1996 .