Primal-Dual Block Frank-Wolfe

We propose a variant of the Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems. Our formulation includes Elastic Net, regularized SVMs and phase retrieval as special cases. The proposed Primal-Dual Block Frank-Wolfe algorithm reduces the per-iteration cost while maintaining linear convergence rate. The per iteration cost of our method depends on the structural complexity of the solution (i.e. sparsity/low-rank) instead of the ambient dimension. We empirically show that our algorithm outperforms the state-of-the-art methods on (multi-class) classification tasks.

[1]  Inderjit S. Dhillon,et al.  Coordinate-wise Power Method , 2016, NIPS.

[2]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[4]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[5]  Lin Xiao,et al.  Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms , 2017, ICML.

[6]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[7]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[8]  Donald Goldfarb,et al.  Linear Convergence of Stochastic Frank Wolfe Variants , 2017, AISTATS.

[9]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[10]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[11]  Martin Jaggi,et al.  Efficient Greedy Coordinate Descent for Composite Problems , 2019, AISTATS.

[12]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Zaïd Harchaoui,et al.  Lifted coordinate descent for learning with trace-norm regularization , 2012, AISTATS.

[15]  Subhash Khot,et al.  Hardness of approximating the shortest vector problem in lattices , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[16]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[17]  Pradeep Ravikumar,et al.  Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization , 2017, ICML.

[18]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[19]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[20]  Pradeep Ravikumar,et al.  Nearest Neighbor based Greedy Coordinate Descent , 2011, NIPS.

[21]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[22]  Piyush Kumar,et al.  A Linearly Convergent Linear-Time First-Order Algorithm for Support Vector Classification with a Core Set Result , 2011, INFORMS J. Comput..

[23]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[24]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[25]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[26]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[27]  P. Pardalos,et al.  Minimax and applications , 1995 .

[28]  Claudio Sartori,et al.  A novel Frank-Wolfe algorithm. Analysis and applications to large-scale SVM training , 2013, Inf. Sci..

[29]  Weizhu Chen,et al.  DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization , 2017, J. Mach. Learn. Res..

[30]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[31]  Yuanzhi Li,et al.  Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls , 2017, NIPS.

[32]  Alexandre d'Aspremont,et al.  Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds , 2018, Journal of Optimization Theory and Applications.

[33]  A. Weintraub,et al.  Accelerating convergence of the Frank-Wolfe algorithm☆ , 1985 .

[34]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[35]  Paul Tseng,et al.  Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning , 2010, SIAM J. Optim..

[36]  Wei Hu,et al.  Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[37]  G. Meyer Accelerated Frank–Wolfe Algorithms , 1974 .