论文信息 - Primal-Dual Block Frank-Wolfe - 字舞流文

Primal-Dual Block Frank-Wolfe

We propose a variant of the Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems. Our formulation includes Elastic Net, regularized SVMs and phase retrieval as special cases. The proposed Primal-Dual Block Frank-Wolfe algorithm reduces the per-iteration cost while maintaining linear convergence rate. The per iteration cost of our method depends on the structural complexity of the solution (i.e. sparsity/low-rank) instead of the ambient dimension. We empirically show that our algorithm outperforms the state-of-the-art methods on (multi-class) classification tasks.

Alexandros G. Dimakis | Inderjit S. Dhillon | Constantine Caramanis | Jiacheng Zhuo | Qi Lei

[1] Inderjit S. Dhillon,et al. Coordinate-wise Power Method , 2016, NIPS.

[2] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[4] Massimiliano Pontil,et al. Convex multi-task feature learning , 2008, Machine Learning.

[5] Lin Xiao,et al. Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms , 2017, ICML.

[6] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[7] Ping Li,et al. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[8] Donald Goldfarb,et al. Linear Convergence of Stochastic Frank Wolfe Variants , 2017, AISTATS.

[9] Martin Jaggi,et al. On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[10] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[11] Martin Jaggi,et al. Efficient Greedy Coordinate Descent for Composite Problems , 2019, AISTATS.

[12] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[13] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[14] Zaïd Harchaoui,et al. Lifted coordinate descent for learning with trace-norm regularization , 2012, AISTATS.

[15] Subhash Khot,et al. Hardness of approximating the shortest vector problem in lattices , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[16] Mark W. Schmidt,et al. Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[17] Pradeep Ravikumar,et al. Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization , 2017, ICML.

[18] S. Kakade,et al. On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[19] Elad Hazan,et al. Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[20] Pradeep Ravikumar,et al. Nearest Neighbor based Greedy Coordinate Descent , 2011, NIPS.

[21] Yi Zhou,et al. Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[22] Piyush Kumar,et al. A Linearly Convergent Linear-Time First-Order Algorithm for Support Vector Classification with a Core Set Result , 2011, INFORMS J. Comput..

[23] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[24] Chih-Jen Lin,et al. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[25] Yonina C. Eldar,et al. Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[26] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[27] P. Pardalos,et al. Minimax and applications , 1995 .

[28] Claudio Sartori,et al. A novel Frank-Wolfe algorithm. Analysis and applications to large-scale SVM training , 2013, Inf. Sci..

[29] Weizhu Chen,et al. DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization , 2017, J. Mach. Learn. Res..

[30] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[31] Yuanzhi Li,et al. Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls , 2017, NIPS.

[32] Alexandre d'Aspremont,et al. Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds , 2018, Journal of Optimization Theory and Applications.

[33] A. Weintraub,et al. Accelerating convergence of the Frank-Wolfe algorithm☆ , 1985 .

[34] Haipeng Luo,et al. Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[35] Paul Tseng,et al. Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning , 2010, SIAM J. Optim..

[36] Wei Hu,et al. Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity , 2018, AISTATS.

[37] G. Meyer. Accelerated Frank–Wolfe Algorithms , 1974 .