Projection Free Online Learning over Smooth Sets

The projection operation is a crucial step in applying Online Gradient Descent (OGD) and its stochastic version SGD. Unfortunately, in some cases, projection is computationally demanding and inhibits us from applying OGD. In this work we focus on the special case where the constraint set is smooth and we have an access to gradient and value oracles of the constraint function. Under these assumptions we design a new approximate projection operation that necessitates only logarithmically many calls to these oracles. We further show that combining OGD with this new approximate projection, results in a projection free variant which recovers the standard rates of the fully projected version. This applies to both convex and strongly-convex online settings.

[1]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[2]  Yi Zhou,et al.  Conditional Accelerated Lazy Stochastic Gradient Descent , 2017, ICML.

[3]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[4]  Ofer Meshi,et al.  Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes , 2016, NIPS.

[5]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[6]  Tianbao Yang,et al.  A Richer Theory of Convex Constrained Optimization with Reduced Projections and Improved Rates , 2016, ICML.

[7]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[8]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[9]  Tianbao Yang,et al.  Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections , 2013, UAI.

[10]  Yuanzhi Li,et al.  Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls , 2017, NIPS.

[11]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[12]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[13]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[14]  Elad Hazan,et al.  Playing Non-linear Games with Linear Oracles , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[15]  Maya R. Gupta,et al.  A Light Touch for Heavily Constrained SGD , 2015, COLT.

[16]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[17]  Jinfeng Yi,et al.  Stochastic Gradient Descent with Only One Projection , 2012, NIPS.

[18]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[19]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[20]  Elad Hazan,et al.  Interior-Point Methods for Full-Information and Bandit Online Learning , 2012, IEEE Transactions on Information Theory.

[21]  Dan Garber,et al.  Faster Projection-free Convex Optimization over the Spectrahedron , 2016, NIPS.

[22]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[23]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[24]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.