论文信息 - Gradient Convergence in Gradient methods with Errors - 字舞流文

Gradient Convergence in Gradient methods with Errors

We consider the gradient method $x_{t+1}=x_t+\g_t(s_t+w_t)$, where $s_t$ is a descent direction of a function $f:\rn\to\re$ and $w_t$ is a deterministic or stochastic error. We assume that $\gr f$ is Lipschitz continuous, that the stepsize $\g_t$ diminishes to 0, and that $s_t$ and $w_t$ satisfy standard conditions. We show that either $f(x_t)\to-\infty$ or $f(x_t)$ converges to a finite value and $\gr f(x_t)\to0$ (with probability 1 in the stochastic case), and in doing so, we remove various boundedness conditions that are assumed in existing results, such as boundedness from below of f, boundedness of $\gr f(x_t)$, or boundedness of xt.

John N. Tsitsiklis | Dimitri P. Bertsekas | D. Bertsekas | J. Tsitsiklis

[1] V. Fabian. STOCHASTIC APPROXIMATION METHODS , 1960 .

[2] Miss A.O. Penney. (b) , 1974, The New Yale Book of Quotations.

[3] Michel Installe,et al. Stochastic approximation methods , 1978 .

[4] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .

[5] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[6] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[7] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[8] Zhi-Quan Luo,et al. On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[9] Harro Walk. Foundations of stochastic approximation , 1992 .

[10] G. Pflug,et al. Stochastic approximation and optimization of random systems , 1992 .

[11] Luo Zhi-quan,et al. Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[12] Alexei A. Gaivoronski,et al. Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1 , 1994 .

[13] Luigi Grippo,et al. A class of unconstrained minimization methods for neural network training , 1994 .

[14] D. Bertsekas,et al. A hybrid incremental gradient method for least squares problems , 1994 .

[15] O. Mangasarian,et al. Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[16] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[17] George Ch. Pflug,et al. Optimization of Stochastic Models , 1996 .

[18] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[19] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[20] B. Delyon. General results on the convergence of stochastic algorithms , 1996, IEEE Trans. Autom. Control..

[21] Dimitri P. Bertsekas,et al. A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[22] A Orman,et al. Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..

[23] V. Borkar. Asynchronous Stochastic Approximations , 1998 .

[24] Tamer Basar,et al. Analysis of Recursive Stochastic Algorithms , 2001 .