论文信息 - Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator - 字舞流文

Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator

We propose a class of novel variance-reduced stochastic conditional gradient methods. By adopting the recent stochastic path-integrated differential estimator technique (SPIDER) of Fang et al. (2018) for the classical Frank-Wolfe (FW) method, we introduce SPIDER-FW for finite-sum minimization as well as the more general expectation minimization problems. SPIDER-FW enjoys superior complexity guarantees in the non-convex setting, while matching the best known FW variants in the convex case. We also extend our framework à la conditional gradient sliding (CGS) of Lan & Zhou (2016), and propose SPIDER-CGS.

Volkan Cevher | Suvrit Sra | Alp Yurtsever | S. Sra | A. Yurtsever | V. Cevher

[1] Elad Hazan,et al. Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[2] Amin Karbasi,et al. Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[3] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[4] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[5] Yan Li,et al. Non-convex Conditional Gradient Sliding , 2017, ICML.

[6] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[7] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[8] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[9] Yaoliang Yu,et al. Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[10] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[11] Haihao Lu,et al. Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization , 2018, Mathematical Programming.

[12] Guanghui Lan. The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle , 2013, 1309.5550.

[13] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[14] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[15] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[16] Rong Jin,et al. Mixed Optimization for Smooth Functions , 2013, NIPS.

[17] Yi Zhou,et al. Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[18] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[19] J Reddi Sashank,et al. Stochastic Frank-Wolfe methods for nonconvex optimization , 2016 .

[20] Haipeng Luo,et al. Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[21] Artin,et al. SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[22] Vikas Singh,et al. Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision , 2018, ArXiv.

[23] Amin Karbasi,et al. Stochastic Conditional Gradient++ , 2019, SIAM J. Optim..

[24] Zebang Shen,et al. Complexities in Projection-Free Stochastic Non-convex Minimization , 2019, AISTATS.

[25] Simon Lacoste-Julien,et al. Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[26] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[27] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[28] Yi Zhou,et al. SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[29] Elad Hazan,et al. Projection-free Online Learning , 2012, ICML.