Stochastic Parallel Block Coordinate Descent for Large-Scale Saddle Point Problems

We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordinate descent methods with the simplicity of primal-dual methods and utilizing the structure of the separable convex-concave saddle point problem. It is capable of solving a wide range of machine learning applications, including robust principal component analysis, Lasso, and feature selection by group Lasso, etc. Theoretically and empirically, we demonstrate significantly better performance than state-of-the-art methods in all these applications.

[1]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[2]  Arnold Neumaier,et al.  OSGA: a fast subgradient algorithm with optimal complexity , 2014, Mathematical Programming.

[3]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[4]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[5]  Antonin Chambolle,et al.  On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[6]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[7]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[8]  D. Balding,et al.  Structured Regularizers for High-Dimensional Problems : Statistical and Computational Issues , 2014 .

[9]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[10]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[11]  Antonin Chambolle,et al.  Diagonal preconditioning for first order primal-dual algorithms in convex optimization , 2011, 2011 International Conference on Computer Vision.

[12]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[13]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[14]  Zhi-Quan Luo,et al.  Parallel Direction Method of Multipliers , 2014, NIPS.

[15]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[16]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[17]  Zhanxing Zhu,et al.  Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems , 2015, ECML/PKDD.

[18]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[19]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[21]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[22]  Stephen P. Boyd,et al.  Block splitting for distributed optimization , 2013, Mathematical Programming Computation.

[23]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[24]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[25]  M. Wainwright Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues , 2014 .

[26]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[27]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .