论文信息 - Non-convex Optimization with Frank-Wolfe Algorithm and Its Variants

Non-convex Optimization with Frank-Wolfe Algorithm and Its Variants

Recently, Frank-Wolfe (a.k.a. conditional gradient) algorithm has become a popular tool for tackling machine learning problems as it avoids the costly projection computation in traditional first-order optimization methods. While the Frank-Wolfe (FW) algorithm has been extensively studied for convex optimization, little is known for the FW algorithm in non-convex optimization. This paper presents a unified convergence analysis for FW algorithm and its variants under the setting of nonconvex but smooth objective with a convex, compact constraint set. Our results are based on a novel observation on the so-called Frank-Wolfe gap (FW gap), which measures the closeness of solution to a stationary point. With a diminishing step size, we show that the FW gap decays at a rate ofO( √ 1/t); and the same rate holds for variants of FW such as the online FW algorithm and decentralized FW algorithm. Numerical experiments are shown to support our findings.

[1] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[2] E. A. Nurminskii. Convergence conditions for nonlinear programming algorithms , 1972 .

[3] Haipeng Luo,et al. Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[4] Simon Lacoste-Julien,et al. Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[5] Martin Jaggi,et al. An Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms , 2013, 1312.7864.

[6] A. Willsky,et al. Sparse and low-rank matrix decompositions , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7] Yaoliang Yu,et al. Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[8] Martin Jaggi,et al. On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[9] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[10] Stephen P. Boyd,et al. Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[11] Alexander J. Smola,et al. Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12] W. Zangwill. Convergence Conditions for Nonlinear Programming Algorithms , 1969 .

[13] Eric Moulines,et al. Decentralized Projection-free Optimization for Convex and Non-convex Problems. , 2016 .

[14] John N. Tsitsiklis,et al. Problems in decentralized decision making and computation , 1984 .

[15] Elad Hazan,et al. Projection-free Online Learning , 2012, ICML.