Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods

The problem of optimizing a quadratic form over an orthogonality constraint (QP-OC for short) is one of the most fundamental matrix optimization problems and arises in many applications. In this paper, we characterize the growth behavior of the objective function around the critical points of the QP-OC problem and demonstrate how such characterization can be used to obtain strong convergence rate results for iterative methods that exploit the manifold structure of the orthogonality constraint (i.e., the Stiefel manifold) to find a critical point of the problem. Specifically, our primary contribution is to show that the Łojasiewicz exponent at any critical point of the QP-OC problem is 1 / 2. Such a result is significant, as it expands the currently very limited repertoire of optimization problems for which the Łojasiewicz exponent is explicitly known. Moreover, it allows us to show, in a unified manner and for the first time, that a large family of retraction-based line-search methods will converge linearly to a critical point of the QP-OC problem. Then, as our secondary contribution, we propose a stochastic variance-reduced gradient (SVRG) method called Stiefel-SVRG for solving the QP-OC problem and present a novel Łojasiewicz inequality-based linear convergence analysis of the method. An important feature of Stiefel-SVRG is that it allows for general retractions and does not require the computation of any vector transport on the Stiefel manifold. As such, it is computationally more advantageous than other recently-proposed SVRG-type algorithms for manifold optimization.

[1]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[2]  P. Schönemann,et al.  On two-sided orthogonal procrustes problems , 1968, Psychometrika.

[3]  Gene H. Golub,et al.  Matrix computations , 1983 .

[4]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[5]  Zhi-Quan Luo,et al.  Error bounds for analytic systems and their applications , 1994, Math. Program..

[6]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[7]  Ji-guang Sun On perturbation bounds for the QR factorization , 1995 .

[8]  G. Stewart,et al.  Perturbation Analyses for the QR Factorization , 1997, SIAM J. Matrix Anal. Appl..

[9]  G. Tusnády,et al.  Extrema of sums of heterogeneous quadratic forms , 1998 .

[10]  Timo Eirola,et al.  On Smooth Decompositions of Matrices , 1999, SIAM J. Matrix Anal. Appl..

[11]  Z. Luo,et al.  Error Bounds for Quadratic Systems , 1999 .

[12]  Zhi-Quan Luo,et al.  New error bounds and their applications to convergence analysis of iterative algorithms , 2000, Math. Program..

[13]  Jonathan H. Manton,et al.  Optimization algorithms exploiting unitary constraints , 2002, IEEE Trans. Signal Process..

[14]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[15]  Robert E. Mahony,et al.  Convergence of the Iterates of Descent Methods for Analytic Cost Functions , 2005, SIAM J. Optim..

[16]  Mauro Forti,et al.  Convergence of Neural Networks for Programming Problems via a Nonsmooth Łojasiewicz Inequality , 2006, IEEE Transactions on Neural Networks.

[17]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[18]  Yaguang Yang Globally Convergent Optimization Algorithms on Riemannian Manifolds: Uniform Framework for Unconstrained and Constrained Optimization , 2007 .

[19]  Visa Koivunen,et al.  Steepest Descent Algorithms for Optimization Under Unitary Matrix Constraint , 2008, IEEE Transactions on Signal Processing.

[20]  J. Bolte,et al.  Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[21]  Yousef Saad,et al.  Trace optimization and eigenproblems in dimension reduction methods , 2011, Numer. Linear Algebra Appl..

[22]  Y. Saad Numerical Methods for Large Eigenvalue Problems , 2011 .

[23]  Anthony Man-Cho So,et al.  Moment inequalities for sums of random matrices and their applications in optimization , 2011, Math. Program..

[24]  Florian Yger,et al.  Adaptive Canonical Correlation Analysis Based On Matrix Manifolds , 2012, ICML.

[25]  Jérôme Malick,et al.  Projection-like Retractions on Matrix Manifolds , 2012, SIAM J. Optim..

[26]  Zhi-Quan Luo,et al.  On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization , 2013, NIPS.

[27]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[28]  Toshihisa Tanaka,et al.  Empirical Arithmetic Averaging Over the Compact Stiefel Manifold , 2013, IEEE Transactions on Signal Processing.

[29]  B. Merlet,et al.  Convergence to equilibrium for discretizations of gradient-like flows on Riemannian manifolds , 2013, Differential and Integral Equations.

[30]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[31]  Hiroyuki Sato,et al.  A Riemannian Optimization Approach to the Matrix Singular Value Decomposition , 2013, SIAM J. Optim..

[32]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[33]  Prateek Jain,et al.  Provable Tensor Factorization with Missing Data , 2014, NIPS.

[34]  Steven Thomas Smith,et al.  Optimization Techniques on Riemannian Manifolds , 2014, ArXiv.

[35]  P. Feehan Global existence and convergence of solutions to gradient systems and applications to Yang-Mills gradient flow , 2014, 1409.1525.

[36]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[37]  A. Uschmajew,et al.  A new convergence proof for the higher-order power method and generalizations , 2014, 1407.4586.

[38]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[39]  Boris S. Mordukhovich,et al.  New fractional error bounds for polynomial systems with applications to Hölderian stability in optimization and spectral theory of tensors , 2015, Math. Program..

[40]  Han Liu,et al.  Provable sparse tensor decomposition , 2015, 1502.01425.

[41]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[42]  Prateek Jain,et al.  Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.

[43]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[44]  Bo Jiang,et al.  A framework of constraint preserving update schemes for optimization on Stiefel manifold , 2013, Math. Program..

[45]  Qi Zhang,et al.  \(\ell_{1, p}\)-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods , 2015, ICML.

[46]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[47]  Reinhold Schneider,et al.  Convergence Results for Projected Line-Search Methods on Varieties of Low-Rank Matrices Via Łojasiewicz Inequality , 2014, SIAM J. Optim..

[48]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[49]  Ohad Shamir,et al.  Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity , 2015, ICML.

[50]  Anthony Man-Cho So,et al.  Quadratic Optimization with Orthogonality Constraints: Explicit Lojasiewicz Exponent and Linear Convergence of Line-Search Methods , 2015, ICML.

[51]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[52]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[53]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization , 2013, SIAM J. Optim..

[54]  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Trans. Inf. Theory.

[55]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[56]  Anthony Man-Cho So,et al.  Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity , 2013, Optim. Methods Softw..

[57]  Anthony Man-Cho So,et al.  On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization , 2016, SIAM J. Optim..

[58]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[59]  Anthony Man-Cho So,et al.  A unified approach to error bounds for structured convex optimization problems , 2015, Mathematical Programming.

[60]  R. Sarpong,et al.  Bio-inspired synthesis of xishacorenes A, B, and C, and a new congener from fuscol† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc02572c , 2019, Chemical science.

[61]  Nicolas Boumal,et al.  Near-Optimal Bounds for Phase Synchronization , 2017, SIAM J. Optim..

[62]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.

[63]  Hiroyuki Kasai,et al.  Riemannian stochastic variance reduced gradient , 2016, SIAM J. Optim..