Non-convex Optimization with Frank-Wolfe Algorithm and Its Variants

Recently, Frank-Wolfe (a.k.a. conditional gradient) algorithm has become a popular tool for tackling machine learning problems as it avoids the costly projection computation in traditional first-order optimization methods. While the Frank-Wolfe (FW) algorithm has been extensively studied for convex optimization, little is known for the FW algorithm in non-convex optimization. This paper presents a unified convergence analysis for FW algorithm and its variants under the setting of nonconvex but smooth objective with a convex, compact constraint set. Our results are based on a novel observation on the so-called Frank-Wolfe gap (FW gap), which measures the closeness of solution to a stationary point. With a diminishing step size, we show that the FW gap decays at a rate ofO( √ 1/t); and the same rate holds for variants of FW such as the online FW algorithm and decentralized FW algorithm. Numerical experiments are shown to support our findings.

[1]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[2]  E. A. Nurminskii Convergence conditions for nonlinear programming algorithms , 1972 .

[3]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[4]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[5]  Martin Jaggi,et al.  An Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms , 2013, 1312.7864.

[6]  A. Willsky,et al.  Sparse and low-rank matrix decompositions , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Yaoliang Yu,et al.  Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[8]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[9]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[10]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[11]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  W. Zangwill Convergence Conditions for Nonlinear Programming Algorithms , 1969 .

[13]  Eric Moulines,et al.  Decentralized Projection-free Optimization for Convex and Non-convex Problems. , 2016 .

[14]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[15]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.