论文信息 - Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization - 字舞流文

Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization

We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.

Laurent El Ghaoui | Francesco Locatello | Fabian Pedregosa | Gideon Dresdner | Alicia Tsai | Geoffrey N'egiar

[1] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[2] Haihao Lu,et al. Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization , 2018, Mathematical Programming.

[3] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[4] Martin Jaggi,et al. A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe , 2017, AISTATS.

[5] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[6] Kevin A. Lai,et al. Faster Rates for Convex-Concave Games , 2018, COLT.

[7] Patrice Marcotte,et al. Some comments on Wolfe's ‘away step’ , 1986, Math. Program..

[8] Boris Polyak,et al. Constrained minimization methods , 1966 .

[9] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[10] Donald Goldfarb,et al. Linear Convergence of Stochastic Frank Wolfe Variants , 2017, AISTATS.

[11] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[12] Jun-Kun Wang,et al. On Frank-Wolfe and Equilibrium Computation , 2017, NIPS.

[13] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[14] Haipeng Luo,et al. Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[15] V. F. Dem'yanov,et al. The Minimization of a Smooth Convex Functional on a Convex Set , 1967 .

[16] Masoud Nikravesh,et al. Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[17] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[18] Martin Jaggi,et al. On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[19] Aurélien Lucchi,et al. Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.

[20] Alexander J. Smola,et al. Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21] Alp Yurtsever,et al. Stochastic Frank-Wolfe for Composite Convex Minimization , 2019, NeurIPS.

[22] Amin Karbasi,et al. Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[23] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..