Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator

We propose a class of novel variance-reduced stochastic conditional gradient methods. By adopting the recent stochastic path-integrated differential estimator technique (SPIDER) of Fang et al. (2018) for the classical Frank-Wolfe (FW) method, we introduce SPIDER-FW for finite-sum minimization as well as the more general expectation minimization problems. SPIDER-FW enjoys superior complexity guarantees in the non-convex setting, while matching the best known FW variants in the convex case. We also extend our framework à la conditional gradient sliding (CGS) of Lan & Zhou (2016), and propose SPIDER-CGS.

[1]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[2]  Amin Karbasi,et al.  Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[3]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[4]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[5]  Yan Li,et al.  Non-convex Conditional Gradient Sliding , 2017, ICML.

[6]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[7]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[8]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[9]  Yaoliang Yu,et al.  Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[10]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[11]  Haihao Lu,et al.  Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization , 2018, Mathematical Programming.

[12]  Guanghui Lan The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle , 2013, 1309.5550.

[13]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[14]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[15]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[16]  Rong Jin,et al.  Mixed Optimization for Smooth Functions , 2013, NIPS.

[17]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[18]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[19]  J Reddi Sashank,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016 .

[20]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[21]  Artin,et al.  SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[22]  Vikas Singh,et al.  Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision , 2018, ArXiv.

[23]  Amin Karbasi,et al.  Stochastic Conditional Gradient++ , 2019, SIAM J. Optim..

[24]  Zebang Shen,et al.  Complexities in Projection-Free Stochastic Non-convex Minimization , 2019, AISTATS.

[25]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[26]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[27]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[28]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[29]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.