论文信息 - Efficient Projection-Free Online Methods with Stochastic Recursive Gradient - 字舞流文

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-iteration computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.

Zebang Shen | Boyu Wang | Hui Qian | Chao Zhang | Jiahao Xie | Chao Zhang | Hui Qian | Zebang Shen | Jiahao Xie | Boyu Wang

[1] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[2] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[3] Steven C. H. Hoi,et al. Online Learning: A Comprehensive Survey , 2018, Neurocomputing.

[4] Lam M. Nguyen,et al. Inexact SARAH algorithm for stochastic optimization , 2018, Optim. Methods Softw..

[5] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[6] Elad Hazan,et al. Projection-free Online Learning , 2012, ICML.

[7] A. Willsky,et al. Sparse and low-rank matrix decompositions , 2009 .

[8] Yoram Singer,et al. A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[9] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[10] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11] I. Pinelis. OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.

[12] Zebang Shen,et al. Complexities in Projection-Free Stochastic Non-convex Minimization , 2019, AISTATS.

[13] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[14] Amin Karbasi,et al. Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity , 2018, ICML.

[15] Volkan Cevher,et al. Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator , 2019, ICML.

[16] Tong Zhang,et al. Projection-free Distributed Online Learning in Networks , 2017, ICML.

[17] Amin Karbasi,et al. Quantized Frank-Wolfe: Faster Optimization, Lower Communication, and Projection Free , 2020, AISTATS.

[18] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[19] Yi Zhou,et al. Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[20] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[21] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[22] Robert E. Schapire,et al. Algorithms for portfolio management based on the Newton method , 2006, ICML.

[23] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[24] Haipeng Luo,et al. Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[25] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[26] J Reddi Sashank,et al. Stochastic Frank-Wolfe methods for nonconvex optimization , 2016 .

[27] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[28] Amin Karbasi,et al. Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[29] Tamir Hazan,et al. Following the Perturbed Leader for Online Structured Learning , 2015, ICML.

[30] Eric Moulines,et al. On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations , 2015, 1510.01171.

[31] Elad Hazan,et al. Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[32] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[33] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[34] Patrice Marcotte,et al. Some comments on Wolfe's ‘away step’ , 1986, Math. Program..

[35] Sathya N. Ravi,et al. Explicitly Imposing Constraints in Deep Networks via Conditional Gradients Gives Improved Generalization and Faster Convergence , 2019, AAAI.

[36] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[37] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[38] Quanquan Gu,et al. Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[39] Ashok Cutkosky,et al. Stochastic and Adversarial Online Learning without Hyperparameters , 2017, NIPS.

[40] John M. Wilson,et al. Introduction to Stochastic Programming , 1998, J. Oper. Res. Soc..