Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

This paper focuses on projection-free methods for solving smooth Online Convex Optimization (OCO) problems. Existing projection-free methods either achieve suboptimal regret bounds or have high per-iteration computational costs. To fill this gap, two efficient projection-free online methods called ORGFW and MORGFW are proposed for solving stochastic and adversarial OCO problems, respectively. By employing a recursive gradient estimator, our methods achieve optimal regret bounds (up to a logarithmic factor) while possessing low per-iteration computational costs. Experimental results demonstrate the efficiency of the proposed methods compared to state-of-the-arts.

[1]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[2]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[3]  Steven C. H. Hoi,et al.  Online Learning: A Comprehensive Survey , 2018, Neurocomputing.

[4]  Lam M. Nguyen,et al.  Inexact SARAH algorithm for stochastic optimization , 2018, Optim. Methods Softw..

[5]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[6]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[7]  A. Willsky,et al.  Sparse and low-rank matrix decompositions , 2009 .

[8]  Yoram Singer,et al.  A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[9]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[10]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11]  I. Pinelis OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.

[12]  Zebang Shen,et al.  Complexities in Projection-Free Stochastic Non-convex Minimization , 2019, AISTATS.

[13]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[14]  Amin Karbasi,et al.  Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity , 2018, ICML.

[15]  Volkan Cevher,et al.  Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator , 2019, ICML.

[16]  Tong Zhang,et al.  Projection-free Distributed Online Learning in Networks , 2017, ICML.

[17]  Amin Karbasi,et al.  Quantized Frank-Wolfe: Faster Optimization, Lower Communication, and Projection Free , 2020, AISTATS.

[18]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[19]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[20]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[21]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[22]  Robert E. Schapire,et al.  Algorithms for portfolio management based on the Newton method , 2006, ICML.

[23]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[24]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[25]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[26]  J Reddi Sashank,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016 .

[27]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[28]  Amin Karbasi,et al.  Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization , 2018, J. Mach. Learn. Res..

[29]  Tamir Hazan,et al.  Following the Perturbed Leader for Online Structured Learning , 2015, ICML.

[30]  Eric Moulines,et al.  On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations , 2015, 1510.01171.

[31]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[32]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[33]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[34]  Patrice Marcotte,et al.  Some comments on Wolfe's ‘away step’ , 1986, Math. Program..

[35]  Sathya N. Ravi,et al.  Explicitly Imposing Constraints in Deep Networks via Conditional Gradients Gives Improved Generalization and Faster Convergence , 2019, AAAI.

[36]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[37]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[38]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[39]  Ashok Cutkosky,et al.  Stochastic and Adversarial Online Learning without Hyperparameters , 2017, NIPS.

[40]  John M. Wilson,et al.  Introduction to Stochastic Programming , 1998, J. Oper. Res. Soc..