论文信息 - On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

In this paper, the online variants of the classical Frank-Wolfe algorithm are considered. We consider minimizing the regret with a stochastic cost. The online algorithms only require simple iterative updates and a non-adaptive step size rule, in contrast to the hybrid schemes commonly considered in the literature. Several new results are derived for convex and non-convex losses. With a strongly convex stochastic cost and when the optimal solution lies in the interior of the constraint set or the constraint set is a polytope, the regret bound and anytime optimality are shown to be ${\cal O}( \log^3 T / T )$ and ${\cal O}( \log^2 T / T)$, respectively, where $T$ is the number of rounds played. These results are based on an improved analysis on the stochastic Frank-Wolfe algorithms. Moreover, the online algorithms are shown to converge even when the loss is non-convex, i.e., the algorithms find a stationary point to the time-varying/stochastic loss at a rate of ${\cal O}(\sqrt{1/T})$. Numerical experiments on realistic data sets are presented to support our theoretical claims.

[1] Martin Jaggi,et al. On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[2] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[3] Ting Sun,et al. Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[4] Elad Hazan,et al. Projection-free Online Learning , 2012, ICML.

[5] Yi Zhou,et al. Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[6] C. Lee Giles,et al. Nonconvex Online Support Vector Machines , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8] Elad Hazan,et al. Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[9] Elad Hazan,et al. A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization , 2013, 1301.4666.

[10] Yang Cao,et al. Poisson Matrix Recovery and Completion , 2015, IEEE Transactions on Signal Processing.

[11] V. Koltchinskii. A remark on low rank matrix recovery and noncommutative Bernstein type inequalities , 2013 .

[12] Martin J. Wainwright,et al. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[13] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[14] Pablo A. Parrilo,et al. The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[15] O. Klopp. Noisy low-rank matrix completion with general sampling distribution , 2012, 1203.0108.

[16] Léon Bottou,et al. On-line learning and stochastic approximations , 1999 .

[17] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[18] Charles R. Johnson,et al. Topics in Matrix Analysis , 1991 .

[19] Jean-Louis Verger-Gaugry,et al. Covering a Ball with Smaller Equal Balls in ℝn , 2005, Discret. Comput. Geom..

[20] Peder A. Olsen,et al. Nuclear Norm Minimization via Active Subspace Selection , 2014, ICML.

[21] Emmanuel J. Candès,et al. Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[22] A. Juditsky. 6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , II : Utilizing Problem ’ s Structure , 2010 .

[23] Martin Jaggi,et al. An Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms , 2013, 1312.7864.

[24] Soumyadip Ghosh,et al. Computing Worst-case Input Models in Stochastic Simulation , 2015 .

[25] Laurent El Ghaoui,et al. An Homotopy Algorithm for the Lasso with Online Observations , 2008, NIPS.

[26] Christopher Ré,et al. Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[27] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[28] Yaoliang Yu,et al. Generalized Conditional Gradient for Sparse Estimation , 2014, J. Mach. Learn. Res..

[29] Ewout van den Berg,et al. 1-Bit Matrix Completion , 2012, ArXiv.

[30] F. Maxwell Harper,et al. The MovieLens Datasets: History and Context , 2016, TIIS.

[31] Ohad Shamir,et al. Learning Kernel-Based Halfspaces with the 0-1 Loss , 2011, SIAM J. Comput..

[32] Emmanuel J. Candès,et al. Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[33] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[34] Paul Grigas,et al. New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[35] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.

[36] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[37] John Langford,et al. Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[38] Haipeng Luo,et al. Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[39] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[40] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[41] Simon Lacoste-Julien,et al. Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[42] L. Rosasco,et al. Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.

[43] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[44] A. Juditsky,et al. 5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .

[45] Yu. M. Ermol'ev,et al. A linearization method in limiting extremal problems , 1976 .

[46] Maxim Raginsky,et al. Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.