A New Class of Incremental Gradient Methods for Least Squares Problems

The least mean squares (LMS) method for linear least squares problems differs from the steepest descent method in that it processes data blocks one-by-one, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit and to slower (sublinear) convergence when close to the optimal solution. We embed both LMS and steepest descent, as well as other intermediate methods, within a one-parameter class of algorithms, and we propose a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rate of steepest descent. These methods are well suited for neural network training problems with large data sets. Furthermore, these methods allow the effective use of scaling based, for example, on diagonal or other approximations of the Hessian matrix.

[1]  W. Davidon New least-square algorithms , 1976 .

[2]  Lennart Ljung,et al.  Analysis of recursive stochastic algorithms , 1977 .

[3]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[4]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[5]  Bernard Widrow,et al.  Adaptive Signal Processing , 1985 .

[6]  John N. Tsitsiklis,et al.  Distributed asynchronous deterministic and stochastic gradient optimization algorithms , 1986 .

[7]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[8]  H. White Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[9]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[10]  Spyros G. Tzafestas,et al.  Learning algorithms for neural networks with the Kalman filters , 1990, J. Intell. Robotic Syst..

[11]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[12]  Olvi L. Mangasarian,et al.  Mathematical Programming in Neural Networks , 1993, INFORMS J. Comput..

[13]  Bradley M. Bell,et al.  The Iterated Kalman Smoother as a Gauss-Newton Method , 1994, SIAM J. Optim..

[14]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[15]  D. Bertsekas Incremental least squares methods and the extended Kalman filter , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[16]  Alexei A. Gaivoronski,et al.  Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[17]  Luigi Grippo,et al.  A class of unconstrained minimization methods for neural network training , 1994 .

[18]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[19]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[20]  Dimitri P. Bertsekas,et al.  Incremental Least Squares Methods and the Extended Kalman Filter , 1996, SIAM J. Optim..

[21]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[22]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[23]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .