Randomized Block Frank–Wolfe for Convergent Large-Scale Learning

Owing to their low-complexity iterations, Frank–Wolfe (FW) solvers are well suited for various large-scale learning tasks. When block-separable constraints are present, randomized block FW (RB-FW) has been shown to further reduce complexity by updating only a fraction of coordinate blocks per iteration. To circumvent the limitations of existing methods, this paper develops step sizes for RB-FW that enable a flexible selection of the number of blocks to update per iteration while ensuring convergence and feasibility of the iterates. To this end, convergence rates of RB-FW are established through computational bounds on a primal suboptimality measure and on the duality gap. The novel bounds extend the existing convergence analysis, which only applies to a step-size sequence that does not generally lead to feasible iterates. Furthermore, two classes of step-size sequences that guarantee feasibility of the iterates are also proposed to enhance flexibility in choosing decay rates. The novel convergence results are markedly broadened to also encompass nonconvex objectives, and further assert that RB-FW with exact line-search reaches a stationary point at rate $\mathcal{O}(1/\sqrt{t})$. Performance of RB-FW with different step sizes and number of blocks is demonstrated in two applications, namely charging of electrical vehicles and structural support vector machines. Extensive simulated tests demonstrate the performance improvement of RB-FW relative to existing randomized single-block FW methods.

[1]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[2]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[3]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[5]  Zaïd Harchaoui,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2013, Math. Program..

[6]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[7]  Georgios B. Giannakis,et al.  Monitoring and Optimization for Power Grids: A Signal Processing Perspective , 2013, IEEE Signal Processing Magazine.

[8]  Georgios B. Giannakis,et al.  Scalable Electric Vehicle Charging Protocols , 2015, IEEE Transactions on Power Systems.

[9]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[10]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[11]  Gang Wang,et al.  Power system state estimation via feasible point pursuit , 2017, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[12]  Fredrik Lindsten,et al.  Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.

[13]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[14]  Anton Osokin,et al.  Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs , 2016, ICML.

[15]  Sophia Decker,et al.  Approximate Methods In Optimization Problems , 2016 .

[16]  Suvrit Sra,et al.  Reflection methods for user-friendly submodular optimization , 2013, NIPS.

[17]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[18]  José R. Dorronsoro,et al.  Group Fused Lasso , 2013, ICANN.

[19]  Yonina C. Eldar,et al.  Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow , 2016, IEEE Transactions on Information Theory.

[20]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[21]  V. F. Demʹi︠a︡nov,et al.  Approximate methods in optimization problems , 1970 .

[22]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[23]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[24]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[25]  Eric P. Xing,et al.  Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms , 2014, ICML.

[26]  Gang Wang,et al.  Sparse Phase Retrieval via Truncated Amplitude Flow , 2016, IEEE Transactions on Signal Processing.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.