INCREMENTAL SUBGRADIENT METHODS1 FOR NONDIFFERENTIABLE OPTIMIZATION

We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to perform the subgradient iteration incrementally, by sequentially taking steps along the subgradients of the component functions, with intermediate adjustment of the variables after processing each component function. This incremental approach has been very successful in solving large differentiable least squares problems, such as those arising in the training of neural networks, and it has resulted in a much better practical rate of convergence than the steepest descent method. In this paper, we establish the convergence properties of a number of variants of incremental subgradient methods, including some that are stochastic. Based on the analysis and computational experiments, the methods appear very promising and effective for important classes of large problems. A particularly interesting discovery is that by randomizing the order of selection of component functions for iteration, the convergence rate is substantially improved. 1 Research supported by NSF under Grant ACI-9873339. 2 Dept. of Electrical Engineering and Computer Science, M.I.T., Cambridge, Mass., 02139.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Boris Polyak Minimization of unsmooth functionals , 1969 .

[3]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[4]  Y. Ermoliev Stochastic quasigradient methods and their application to system optimization , 1983 .

[5]  David K. Smith,et al.  Mathematical Programming: Theory and Algorithms , 1986 .

[6]  Yuri Ermoliev,et al.  Stochastic quasigradient methods. Numerical techniques for stochastic optimization , 1988 .

[7]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[8]  Claude Lemaréchal,et al.  Convergence of some algorithms for convex minimization , 1993, Math. Program..

[9]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[10]  Luigi Grippo,et al.  A class of unconstrained minimization methods for neural network training , 1994 .

[11]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[12]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[14]  D. Bertsekas Gradient convergence in gradient methods , 1997 .

[15]  Paul Tseng,et al.  An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[16]  M. Caramanis,et al.  Efficient Lagrangian relaxation algorithms for industry size job-shop scheduling problems , 1998 .

[17]  M. Solodov,et al.  Error Stability Properties of Generalized Gradient-Type Algorithms , 1998 .

[18]  Mikhail V. Solodov,et al.  Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero , 1998, Comput. Optim. Appl..

[19]  Jean-Louis Goffin,et al.  Convergence of a simple subgradient level method , 1999, Math. Program..

[20]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[21]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[22]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[23]  K. Kiwiel,et al.  Parallel Subgradient Methods for Convex Optimization , 2001 .