Conditional gradient algorithms for norm-regularized smooth convex optimization

Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone $$K$$K, a norm $$\Vert \cdot \Vert $$‖·‖ and a smooth convex function $$f$$f, we want either (1) to minimize the norm over the intersection of the cone and a level set of $$f$$f, or (2) to minimize over the cone the sum of $$f$$f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) $$\Vert \cdot \Vert $$‖·‖ is “too complicated” to allow for computationally cheap Bregman projections required in the first-order proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of $$K$$K and the unit $$\Vert \cdot \Vert $$‖·‖-ball. Motivating examples are given by the nuclear norm with $$K$$K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications.

[1]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[2]  Charles A. Holloway An extension of the frank and Wolfe method of feasible directions , 1974, Math. Program..

[3]  B. N. Pshenichnyi,et al.  Numerical Methods in Extremal Problems. , 1978 .

[4]  J. Dunn,et al.  Conditional gradient algorithms with open loop step size rules , 1978 .

[5]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[6]  J. A. Ventura,et al.  Restricted simplicial decomposition: computation and extensions , 1987 .

[7]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[8]  Jose A. Ventura,et al.  Restricted simplicial decomposition for convex constrained problems , 1993, Math. Program..

[9]  Yurii Nesterov,et al.  New variants of bundle methods , 1995, Math. Program..

[10]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[11]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[12]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[13]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[14]  M. Zibulevsky,et al.  Sequential Subspace Optimization Method for Large-Scale Unconstrained Problems , 2005 .

[15]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[16]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[17]  D. Goldfarb,et al.  Solving low-rank matrix completion problems efficiently , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[19]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[20]  Uriel G. Rothblum,et al.  Accuracy Certificates for Computational Problems with Convex Structure , 2010, Math. Oper. Res..

[21]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[22]  Shiqian Ma,et al.  Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..

[23]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[24]  F. Bach,et al.  Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning) , 2011 .

[25]  Ohad Shamir,et al.  Large-Scale Convex Minimization with a Low-Rank Constraint , 2011, ICML.

[26]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[27]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[28]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[29]  Anatoli Juditsky,et al.  Conditional gradient algorithms for regularized learning , 2012 .

[30]  Matthijs Douze,et al.  Large-scale image classification with trace-norm regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Yaoliang Yu,et al.  Accelerated Training for Matrix-norm Regularization: A Boosting Approach , 2012, NIPS.

[32]  Zaïd Harchaoui,et al.  Lifted coordinate descent for learning with trace-norm regularization , 2012, AISTATS.

[33]  Junfeng Yang,et al.  Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization , 2012, Math. Comput..

[34]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[35]  Zaid Harchaoui,et al.  Conditional gradient algorithms for machine learning , 2013 .

[36]  Yurii Nesterov,et al.  On first-order algorithms for l1/nuclear norm minimization , 2013, Acta Numerica.

[37]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[38]  Arkadi Nemirovski,et al.  Randomized first order algorithms with applications to ℓ1-minimization , 2013, Math. Program..

[39]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[40]  Arkadi Nemirovski,et al.  Dual subgradient algorithms for large-scale nonsmooth learning problems , 2013, Math. Program..

[41]  Sophia Decker,et al.  Approximate Methods In Optimization Problems , 2016 .