Alternating minimization and alternating descent over nonconvex sets

We analyze the performance of alternating minimization for loss functions optimized over two variables, where each variable may be restricted to lie in some potentially nonconvex constraint set. This type of setting arises naturally in high-dimensional statistics and signal processing, where the variables often reflect different structures or components within the signals being considered. Our analysis relies on the notion of local concavity coefficients, which has been proposed in Barber and Ha to measure and quantify the concavity of a general nonconvex set. Our results further reveal important distinctions between alternating and non-alternating methods. Since computing the alternating minimization steps may not be tractable for some problems, we also consider an inexact version of the algorithm and provide a set of sufficient conditions to ensure fast convergence of the inexact algorithms. We demonstrate our framework on several examples, including low rank + sparse decomposition and multitask regression, and provide numerical experiments to validate our theoretical results.

[1]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[2]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[3]  A. Auslender Optimisation : méthodes numériques , 1976 .

[4]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[5]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[8]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[9]  Martin J. Wainwright,et al.  Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.

[10]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[11]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[12]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[13]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[14]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[15]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[16]  Joel A. Tropp,et al.  Living on the edge: phase transitions in convex programs with random data , 2013, 1303.6672.

[17]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[18]  Caroline Uhler,et al.  Maximum likelihood estimation for linear Gaussian covariance models , 2014, 1408.5604.

[19]  Prateek Jain,et al.  Alternating Minimization for Regression Problems with Vector-valued Outputs , 2015, NIPS.

[20]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[21]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[22]  Amir Beck,et al.  On the Convergence of Alternating Minimization for Convex Programming with Applications to Iteratively Reweighted Least Squares and Decomposition Schemes , 2015, SIAM J. Optim..

[23]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[24]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[25]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[26]  Zhaoran Wang,et al.  Low-Rank and Sparse Structure Pursuit via Alternating Minimization , 2016, AISTATS.

[27]  R. Barber,et al.  Gradient descent with nonconvex constraints: local concavity determines convergence , 2017, 1703.07755.

[28]  Benjamin Recht,et al.  Sharp Time–Data Tradeoffs for Linear Inverse Problems , 2015, IEEE Transactions on Information Theory.

[29]  Jelena Diakonikolas,et al.  Alternating Randomized Block Coordinate Descent , 2018, ICML.

[30]  Anders Rantzer,et al.  Low-Rank Optimization With Convex Constraints , 2016, IEEE Transactions on Automatic Control.