Gradient convergence in gradient methods

For the classical gradient method Xt+l = xt - -ytVf(xt) and several deterministic and stochastic variants, we discuss the issue of convergence of the gradient sequence Vf(xt) and the attendant issue of stationarity of limit points of xt. W;"e assume that Vf is Lipschitz continuous, and that the stepsize at diminishes to 0 and satisfies standard stochastic approximation conditions. We show that either f(xt) - -oo or else f(xt) converges to a finite value and Vf(.t) -- 0 (with probability 1 in the stochastic case). Existing results assume various boundedness conditions such as boundedness from below of f, or boundedness of Vf(xt), or boundedness of Xt.

[1]  V. Fabian STOCHASTIC APPROXIMATION METHODS , 1960 .

[2]  Lennart Ljung,et al.  Analysis of recursive stochastic algorithms , 1977 .

[3]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[4]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[5]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[6]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[7]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[8]  Alexei A. Gaivoronski,et al.  Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[9]  Luigi Grippo,et al.  A class of unconstrained minimization methods for neural network training , 1994 .

[10]  D. Bertsekas,et al.  A hybrid incremental gradient method for least squares problems , 1994 .

[11]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[12]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[15]  B. Delyon General results on the convergence of stochastic algorithms , 1996, IEEE Trans. Autom. Control..

[16]  V. Borkar Asynchronous Stochastic Approximations , 1998 .