Projection-free Online Learning

The computational bottleneck in applying online learning to massive data sets is usually the projection step. We present efficient online learning algorithms that eschew projections in favor of much more efficient linear optimization steps using the Frank-Wolfe technique. We obtain a range of regret bounds for online convex optimization, with better bounds for specific cases such as stochastic online smooth convex optimization. Besides the computational advantage, other desirable features of our algorithms are that they are parameter-free in the stochastic case and produce sparse decisions. We apply our algorithms to computationally intensive applications of collaborative filtering, and show the theoretical improvements to be clearly visible on standard datasets.

[1]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[2]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[3]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[4]  G. Wahba A Least Squares Estimate of Satellite Attitude , 1965 .

[5]  Grace Wahba,et al.  Problem 65-1: A least squares estimate of satellite attitude , 1966 .

[6]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[7]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[8]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[9]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[10]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[11]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[12]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[13]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[14]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[15]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[16]  Ruslan Salakhutdinov,et al.  Practical Large-Scale Optimization for Max-norm Regularization , 2010, NIPS.

[17]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[18]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[19]  Ruslan Salakhutdinov,et al.  Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm , 2010, NIPS.

[20]  Martin Jaggi,et al.  Convex Optimization without Projection Steps , 2011, ArXiv.

[21]  Elad Hazan The convex optimization approach to regret minimization , 2011 .

[22]  Ohad Shamir,et al.  Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing , 2011, COLT.

[23]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..