论文信息 - On the Asymptotic Linear Convergence Speed of Anderson Acceleration, Nesterov Acceleration, and Nonlinear GMRES

On the Asymptotic Linear Convergence Speed of Anderson Acceleration, Nesterov Acceleration, and Nonlinear GMRES

We consider nonlinear convergence acceleration methods for fixed-point iteration $x_{k+1}=q(x_k)$, including Anderson acceleration (AA), nonlinear GMRES (NGMRES), and Nesterov-type acceleration (corresponding to AA with window size one). We focus on fixed-point methods that converge asymptotically linearly with convergence factor $\rho<1$ and that solve an underlying fully smooth and non-convex optimization problem. It is often observed that AA and NGMRES substantially improve the asymptotic convergence behavior of the fixed-point iteration, but this improvement has not been quantified theoretically. We investigate this problem under simplified conditions. First, we consider stationary versions of AA and NGMRES, and determine coefficients that result in optimal asymptotic convergence factors, given knowledge of the spectrum of $q'(x)$ at the fixed point $x^*$. This allows us to understand and quantify the asymptotic convergence improvement that can be provided by nonlinear convergence acceleration, viewing $x_{k+1}=q(x_k)$ as a nonlinear preconditioner for AA and NGMRES. Second, for the case of infinite window size, we consider linear asymptotic convergence bounds for GMRES applied to the fixed-point iteration linearized about $x^*$. Since AA and NGMRES are equivalent to GMRES in the linear case, one may expect the GMRES convergence factors to be relevant for AA and NGMRES as $x_k \rightarrow x^*$. Our results are illustrated numerically for a class of test problems from canonical tensor decomposition, comparing steepest descent and alternating least squares (ALS) as the fixed-point iterations that are accelerated by AA and NGMRES. Our numerical tests show that both approaches allow us to estimate asymptotic convergence speed for nonstationary AA and NGMRES with finite window size.

Hans De Sterck | Yunhui He

[1] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[2] L. Trefethen,et al. Numerical linear algebra , 1997 .

[3] C. T. Kelley,et al. Convergence Analysis for Anderson Acceleration , 2015, SIAM J. Numer. Anal..

[4] Quantifying the asymptotic linear convergence speed of Anderson Acceleration applied to ADMM , 2020, ArXiv.

[5] Hans De Sterck,et al. A Nonlinear GMRES Optimization Algorithm for Canonical Tensor Decomposition , 2011, SIAM J. Sci. Comput..

[6] Daniel M. Dunlavy,et al. An Optimization Approach for Fitting Canonical Tensor Decompositions. , 2009 .

[7] A. Mees,et al. Domains containing the field of values of a matrix , 1979 .

[8] Richard S. Varga,et al. The analysis ofk-step iterative methods for linear systems from summability theory , 1983 .

[9] Tamara G. Kolda,et al. Poblano v1.0: A Matlab Toolbox for Gradient-Based Optimization , 2010 .

[10] Hans De Sterck,et al. An Adaptive Algebraic Multigrid Algorithm for Low-Rank Canonical Tensor Decomposition , 2013, SIAM J. Sci. Comput..

[11] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[12] David J. Thuente,et al. Line search algorithms with guaranteed sufficient decrease , 1994, TOMS.

[13] Leo G. Rebholz,et al. A Proof That Anderson Acceleration Improves the Convergence Rate in Linearly Converging Fixed-Point Methods (But Not in Those Converging Quadratically) , 2018, SIAM J. Numer. Anal..

[14] Bailin Deng,et al. Accelerating ADMM for efficient simulation and optimization , 2019, ACM Trans. Graph..

[15] Homer F. Walker,et al. Anderson Acceleration for Fixed-Point Iterations , 2011, SIAM J. Numer. Anal..

[16] Eugene E. Tyrtyshnikov,et al. Some Remarks on the Elman Estimate for GMRES , 2005, SIAM J. Matrix Anal. Appl..

[17] Matthew G. Knepley,et al. Composing Scalable Nonlinear Algebraic Solvers , 2015, SIAM Rev..

[18] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[19] L. Trefethen,et al. Spectra and pseudospectra : the behavior of nonnormal matrices and operators , 2005 .

[20] Hans De Sterck,et al. A nonlinearly preconditioned conjugate gradient algorithm for rank‐R canonical tensor approximation , 2014, Numer. Linear Algebra Appl..

[21] Hans De Sterck. NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. (2012) Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/nla.1837 Steepest descent preconditioning for nonlinear , 2022 .

[22] Cornelis W. Oosterlee,et al. KRYLOV SUBSPACE ACCELERATION FOR NONLINEAR MULTIGRID SCHEMES , 1997 .

[23] Alexandre d'Aspremont,et al. Regularized nonlinear acceleration , 2016, Mathematical Programming.

[24] Emmanuel J. Candès,et al. Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[25] André Uschmajew,et al. Local Convergence of the Alternating Least Squares Algorithm for Canonical Tensor Approximation , 2012, SIAM J. Matrix Anal. Appl..

[26] Cornelis W. Oosterlee,et al. Krylov Subspace Acceleration of Nonlinear Multigrid with Application to Recirculating Flows , 1999, SIAM J. Sci. Comput..

[27] Yousef Saad,et al. Two classes of multisecant methods for nonlinear acceleration , 2009, Numer. Linear Algebra Appl..

[28] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..

[29] Nicolas Gillis,et al. Accelerating Nonnegative Matrix Factorization Algorithms Using Extrapolation , 2018, Neural Computation.

[30] David G. Luenberger,et al. Linear and nonlinear programming , 1984 .

[31] Hans De Sterck,et al. Nonlinearly preconditioned L‐BFGS as an acceleration mechanism for alternating least squares with application to tensor decomposition , 2018, Numer. Linear Algebra Appl..

[32] Richard S. Varga,et al. k-Step iterative methods for solving nonlinear systems of equations , 1986 .

[33] Hans De Sterck,et al. Nesterov acceleration of alternating least squares for canonical tensor decomposition: Momentum step size selection and restart mechanisms , 2018, Numer. Linear Algebra Appl..

[34] Daniel M. Dunlavy,et al. A scalable optimization approach for fitting canonical tensor decompositions , 2011 .

[35] Anne Greenbaum,et al. Any Nonincreasing Convergence Curve is Possible for GMRES , 1996, SIAM J. Matrix Anal. Appl..

[36] Hans De Sterck,et al. Nonlinearly Preconditioned Optimization on Grassmann Manifolds for Computing Approximate Tucker Tensor Decompositions , 2016, SIAM J. Sci. Comput..

[37] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[38] Irad Yavneh,et al. Accelerating Multigrid Optimization via SESOP , 2018, ArXiv.

[39] Donald G. M. Anderson. Iterative Procedures for Nonlinear Integral Equations , 1965, JACM.