On Matching Pursuit and Coordinate Descent

Two popular examples of first-order optimization methods over linear spaces are coordinate descent and matching pursuit algorithms, with their randomized variants. While the former targets the optimization by moving along coordinates, the latter considers a generalized notion of directions. Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal{O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives. As a byproduct of our affine invariant analysis of matching pursuit, our rates for steepest coordinate descent are the tightest known. Furthermore, we show the first accelerated convergence rate $\mathcal{O}(1/t^2)$ for matching pursuit and steepest coordinate descent on convex objectives.

[1]  D. B. Goodner Projections in normed linear spaces , 1950 .

[2]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[3]  Robert Hooke,et al.  `` Direct Search'' Solution of Numerical and Statistical Problems , 1961, JACM.

[4]  V. G. Karmanov Convergence estimates for iterative minimization methods , 1974 .

[5]  Charles A. Holloway An extension of the frank and Wolfe method of feasible directions , 1974, Math. Program..

[6]  V. G. Karmanov On Convergence of a Random Search Method in Convex Minimization Problems , 1975 .

[7]  Ryszard Zieliński,et al.  Stochastische Verfahren zur Suche nach dem Minimum einer Funktion , 1983 .

[8]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[9]  J. Dennis,et al.  Direct Search Methods on Parallel Machines , 1991 .

[10]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[11]  Virginia Torczon,et al.  On the Convergence of Pattern Search Algorithms , 1997, SIAM J. Optim..

[12]  Gunnar Rätsch,et al.  On the Convergence of Leveraging , 2001, NIPS.

[13]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[14]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[15]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[16]  Pierre Vandergheynst,et al.  On the exponential convergence of matching pursuits in quasi-incoherent dictionaries , 2006, IEEE Transactions on Information Theory.

[17]  Michael B. Wakin,et al.  Analysis of Orthogonal Matching Pursuit Using the Restricted Isometry Property , 2009, IEEE Transactions on Information Theory.

[18]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[19]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[20]  Martin Jaggi,et al.  An Optimal Affine Invariant Smooth Minimization Algorithm , 2013, 1301.0465.

[21]  Christian L. Müller,et al.  Optimization of Convex Functions with Random Pursuit , 2011, SIAM J. Optim..

[22]  Yin Tat Lee,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[23]  V. Temlyakov Chebushev Greedy Algorithm in convex optimization , 2013, 1312.1244.

[24]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[25]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[26]  Vladimir N. Temlyakov,et al.  Greedy algorithms in convex optimization on Banach spaces , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[27]  Zeyuan Allen-Zhu,et al.  Linear Coupling of Gradient and Mirror Descent: A Novel, Simple Interpretation of Nesterov's Accelerated Method , 2014 .

[28]  Hao Nguyen,et al.  Greedy Strategies for Convex Optimization , 2014, 1401.1754.

[29]  Sebastian U. Stich,et al.  Convex Optimization with Random Pursuit , 2014 .

[30]  Nicolas Gillis,et al.  Hierarchical Clustering of Hyperspectral Images Using Rank-Two Nonnegative Matrix Factorization , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[32]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[33]  Ryan J. Tibshirani,et al.  A general framework for fast stagewise algorithms , 2014, J. Mach. Learn. Res..

[34]  Zeyuan Allen Zhu,et al.  Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[35]  Martin Jaggi,et al.  Approximate Steepest Coordinate Descent , 2017, ICML.

[36]  Yong Jiang,et al.  Accelerated Stochastic Greedy Coordinate Descent by Soft Thresholding Projection onto Simplex , 2017, NIPS.

[37]  Yurii Nesterov,et al.  Efficiency of the Accelerated Coordinate Descent Method on Structured Optimization Problems , 2017, SIAM J. Optim..

[38]  Gunnar Rätsch,et al.  Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees , 2017, NIPS.

[39]  Martin Jaggi,et al.  A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe , 2017, AISTATS.

[40]  Nicolas Gillis,et al.  A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary , 2016, IEEE Transactions on Image Processing.

[41]  Alexandre d'Aspremont,et al.  Optimal Affine-Invariant Smooth Minimization Algorithms , 2018, SIAM J. Optim..

[42]  Vahab S. Mirrokni,et al.  Accelerating Greedy Coordinate Descent Methods , 2018, ICML.

[43]  Javier Peña,et al.  Polytope Conditioning and Linear Convergence of the Frank-Wolfe Algorithm , 2015, Math. Oper. Res..

[44]  Martin Jaggi,et al.  Efficient Greedy Coordinate Descent for Composite Problems , 2019, AISTATS.