Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization

We provide stronger and more general primal-dual convergence results for Frank-Wolfe-type algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worst-case optimal in the sparsity of the obtained solutions. On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, low-rank matrices, permutation matrices, or max-norm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach.

[1]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[2]  Boris Polyak,et al.  Constrained minimization methods , 1966 .

[3]  M. Canon,et al.  A Tight Upper Bound on the Rate of Convergence of Frank-Wolfe Algorithm , 1968 .

[4]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[5]  J. Dunn,et al.  Conditional gradient algorithms with open loop step size rules , 1978 .

[6]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[7]  Patrice Marcotte,et al.  Some comments on Wolfe's ‘away step’ , 1986, Math. Program..

[8]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[9]  Henryk Wozniakowski,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992, SIAM J. Matrix Anal. Appl..

[10]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[11]  J. Kuczy,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .

[12]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[13]  M. Patriksson Partial linearization methods in nonlinear programming , 1993 .

[14]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Andrew R. Barron,et al.  Mixture Density Estimation , 1999, NIPS.

[17]  Jack Edmonds,et al.  Submodular Functions, Matroids, and Certain Polyhedra , 2001, Combinatorial Optimization.

[18]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[19]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20]  Noga Alon,et al.  Approximating the cut-norm via Grothendieck's inequality , 2004, STOC '04.

[21]  Sanjeev Arora,et al.  Fast algorithms for approximate semidefinite programming using the multiplicative weights update method , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[22]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[23]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[25]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[26]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[27]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[28]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[29]  Martin Jaggi,et al.  Coresets for polytope distance , 2009, SCG '09.

[30]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[31]  Ruslan Salakhutdinov,et al.  Practical Large-Scale Optimization for Max-norm Regularization , 2010, NIPS.

[32]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[33]  Tong Zhang,et al.  Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints , 2010, SIAM J. Optim..

[34]  Alexander G. Gray,et al.  Fast Stochastic Frank-Wolfe Algorithms for Nonlinear SVMs , 2010, SDM.

[35]  Jean-Philippe Vert,et al.  Group Lasso with Overlaps: the Latent Group Lasso approach , 2011, ArXiv.

[36]  BachFrancis,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2011 .

[37]  Pradeep Ravikumar,et al.  Greedy Algorithms for Structurally Constrained High Dimensional Problems , 2011, NIPS.

[38]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[39]  Martin Jaggi,et al.  Sparse Convex Optimization Methods for Machine Learning , 2011 .

[40]  V. N. Temlyakov,et al.  Greedy Approximation in Convex Optimization , 2012, Constructive Approximation.

[41]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[42]  Shuicheng Yan,et al.  Forward Basis Selection for Sparse Approximation over Dictionary , 2012, AISTATS.

[43]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[44]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[45]  Martin Jaggi,et al.  Regularization Paths with Guarantees for Convex Semidefinite Optimization , 2012, AISTATS.

[46]  Yaoliang Yu,et al.  Accelerated Training for Matrix-norm Regularization: A Boosting Approach , 2012, NIPS.

[47]  Zaïd Harchaoui,et al.  Lifted coordinate descent for learning with trace-norm regularization , 2012, AISTATS.

[48]  Francesco Orabona,et al.  PRISMA: PRoximal Iterative SMoothing Algorithm , 2012, ArXiv.

[49]  V. Temlyakov Chebushev Greedy Algorithm in convex optimization , 2013, 1312.1244.

[50]  Zaid Harchaoui,et al.  Conditional gradient algorithms for machine learning , 2013 .

[51]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[52]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[53]  Colt , 2014 .