The Chaotic Nature of Faster Gradient Descent Methods

The steepest descent method for large linear systems is well-known to often converge very slowly, with the number of iterations required being about the same as that obtained by utilizing a gradient descent method with the best constant step size and growing proportionally to the condition number. Faster gradient descent methods must occasionally resort to significantly larger step sizes, which in turn yields a rather non-monotone decrease pattern in the residual vector norm.We show that such faster gradient descent methods in fact generate chaotic dynamical systems for the normalized residual vectors. Very little is required to generate chaos here: simply damping steepest descent by a constant factor close to 1 will do.Several variants of the family of faster gradient descent methods are investigated, both experimentally and analytically. The fastest practical methods of this family in general appear to be the known, chaotic, two-step ones. Our results also highlight the need of better theory for existing faster gradient descent methods.

[1]  H. Akaike On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method , 1959 .

[2]  J. Yorke,et al.  Period Three Implies Chaos , 1975 .

[3]  I. Shimada,et al.  A Numerical Approach to Ergodic Problem of Dissipative Dynamical Systems , 1979 .

[4]  Gene H. Golub,et al.  Matrix computations , 1983 .

[5]  A. Grossmann,et al.  Cycle-octave and related transforms in seismic signal analysis , 1984 .

[6]  Bonaventure Intercontinental,et al.  ON DECISION AND CONTROL , 1985 .

[7]  C. Vogel Computational Methods for Inverse Problems , 1987 .

[8]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[9]  A. M. Lyapunov The general problem of the stability of motion , 1992 .

[10]  Marcos Raydan,et al.  Molecular conformations from distance matrices , 1993, J. Comput. Chem..

[11]  A. R. Humphries,et al.  Dynamical Systems And Numerical Analysis , 1996 .

[12]  J. M. Martínez,et al.  Gradient Method with Retards and Generalizations , 1998 .

[13]  Luc Pronzato,et al.  Dynamical Search - Applications of Dynamical Systems in Search and Optimization: Interdisciplinary Statistics , 1999 .

[14]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[15]  Gene H. Golub,et al.  Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration , 1999, SIAM J. Sci. Comput..

[16]  E. Haber,et al.  Preconditioned all-at-once methods for large, sparse parameter estimation problems , 2001 .

[17]  Jorge Nocedal,et al.  On the Behavior of the Gradient Norm in the Steepest Descent Method , 2002, Comput. Optim. Appl..

[18]  Marcos Raydan,et al.  Relaxed Steepest Descent and Cauchy-Barzilai-Borwein Method , 2002, Comput. Optim. Appl..

[19]  L. Liao,et al.  R-linear convergence of the Barzilai and Borwein gradient method , 2002 .

[20]  Ya-Xiang Yuan,et al.  Alternate minimization gradient method , 2003 .

[21]  J. Nagy,et al.  Steepest Descent, CG, and Iterative Regularization of Ill-Posed Problems , 2003 .

[22]  Roger Fletcher,et al.  Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming , 2005, Numerische Mathematik.

[23]  Roger Fletcher,et al.  On the Barzilai-Borwein Method , 2005 .

[24]  Kok Lay Teo,et al.  Optimization and control with applications , 2005 .

[25]  Roger Fletcher,et al.  On the asymptotic behaviour of some new gradient methods , 2005, Math. Program..

[26]  W. Hager,et al.  The cyclic Barzilai-–Borwein method for unconstrained optimization , 2006 .

[27]  U. Ascher,et al.  Artificial time integration , 2007 .

[28]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[29]  Hui Huang,et al.  Efficient reconstruction of 2D images and 3D surfaces , 2008 .

[30]  U. Ascher,et al.  Esaim: Mathematical Modelling and Numerical Analysis Gradient Descent and Fast Artificial Time Integration , 2022 .

[31]  Pierre-Alexandre Bliman,et al.  Control-theoretic design of iterative methods for symmetric linear systems of equations , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[32]  J. Meza,et al.  Steepest descent , 2010 .

[33]  Hui Huang,et al.  Faster Gradient Descent and the Efficient Recovery of Images , 2013, Vietnam Journal of Mathematics.