On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

In this paper, the online variants of the classical Frank-Wolfe algorithm are considered. We consider minimizing the regret with a stochastic cost. The online algorithms only require simple iterative updates and a non-adaptive step size rule, in contrast to the hybrid schemes commonly considered in the literature. Several new results are derived for convex and non-convex losses. With a strongly convex stochastic cost and when the optimal solution lies in the interior of the constraint set or the constraint set is a polytope, the regret bound and anytime optimality are shown to be ${\cal O}( \log^3 T / T )$ and ${\cal O}( \log^2 T / T)$, respectively, where $T$ is the number of rounds played. These results are based on an improved analysis on the stochastic Frank-Wolfe algorithms. Moreover, the online algorithms are shown to converge even when the loss is non-convex, i.e., the algorithms find a stationary point to the time-varying/stochastic loss at a rate of ${\cal O}(\sqrt{1/T})$. Numerical experiments on realistic data sets are presented to support our theoretical claims.

[1]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[2]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[3]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[4]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[5]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[6]  C. Lee Giles,et al.  Nonconvex Online Support Vector Machines , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[9]  Elad Hazan,et al.  A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization , 2013, 1301.4666.

[10]  Yang Cao,et al.  Poisson Matrix Recovery and Completion , 2015, IEEE Transactions on Signal Processing.

[11]  V. Koltchinskii A remark on low rank matrix recovery and noncommutative Bernstein type inequalities , 2013 .

[12]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[13]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[14]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[15]  O. Klopp Noisy low-rank matrix completion with general sampling distribution , 2012, 1203.0108.

[16]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[17]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[18]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[19]  Jean-Louis Verger-Gaugry,et al.  Covering a Ball with Smaller Equal Balls in ℝn , 2005, Discret. Comput. Geom..

[20]  Peder A. Olsen,et al.  Nuclear Norm Minimization via Active Subspace Selection , 2014, ICML.

[21]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[22]  A. Juditsky 6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , II : Utilizing Problem ’ s Structure , 2010 .

[23]  Martin Jaggi,et al.  An Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms , 2013, 1312.7864.

[24]  Soumyadip Ghosh,et al.  Computing Worst-case Input Models in Stochastic Simulation , 2015 .

[25]  Laurent El Ghaoui,et al.  An Homotopy Algorithm for the Lasso with Online Observations , 2008, NIPS.

[26]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[27]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[28]  Yaoliang Yu,et al.  Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[29]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[30]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[31]  Ohad Shamir,et al.  Learning Kernel-Based Halfspaces with the 0-1 Loss , 2011, SIAM J. Comput..

[32]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[33]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[34]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[35]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[36]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[37]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[38]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[39]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[40]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[41]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[42]  L. Rosasco,et al.  Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.

[43]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[44]  A. Juditsky,et al.  5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .

[45]  Yu. M. Ermol'ev,et al.  A linearization method in limiting extremal problems , 1976 .

[46]  Maxim Raginsky,et al.  Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.