Gradient Convergence in Gradient methods with Errors

We consider the gradient method $x_{t+1}=x_t+\g_t(s_t+w_t)$, where $s_t$ is a descent direction of a function $f:\rn\to\re$ and $w_t$ is a deterministic or stochastic error. We assume that $\gr f$ is Lipschitz continuous, that the stepsize $\g_t$ diminishes to 0, and that $s_t$ and $w_t$ satisfy standard conditions. We show that either $f(x_t)\to-\infty$ or $f(x_t)$ converges to a finite value and $\gr f(x_t)\to0$ (with probability 1 in the stochastic case), and in doing so, we remove various boundedness conditions that are assumed in existing results, such as boundedness from below of f, boundedness of $\gr f(x_t)$, or boundedness of xt.

[1]  V. Fabian STOCHASTIC APPROXIMATION METHODS , 1960 .

[2]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[3]  Michel Installe,et al.  Stochastic approximation methods , 1978 .

[4]  G. Pflug Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .

[5]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[6]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[7]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[8]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[9]  Harro Walk Foundations of stochastic approximation , 1992 .

[10]  G. Pflug,et al.  Stochastic approximation and optimization of random systems , 1992 .

[11]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[12]  Alexei A. Gaivoronski,et al.  Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[13]  Luigi Grippo,et al.  A class of unconstrained minimization methods for neural network training , 1994 .

[14]  D. Bertsekas,et al.  A hybrid incremental gradient method for least squares problems , 1994 .

[15]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[16]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[17]  George Ch. Pflug,et al.  Optimization of Stochastic Models , 1996 .

[18]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[19]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[20]  B. Delyon General results on the convergence of stochastic algorithms , 1996, IEEE Trans. Autom. Control..

[21]  Dimitri P. Bertsekas,et al.  A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[22]  A Orman,et al.  Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..

[23]  V. Borkar Asynchronous Stochastic Approximations , 1998 .

[24]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .