Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization

In this paper, we propose a faster stochastic alternating direction method of multipliers (ADMM) for nonconvex optimization by using a new stochastic path-integrated differential estimator (SPIDER), called as SPIDER-ADMM. Moreover, we prove that the SPIDER-ADMM achieves a record-breaking incremental first-order oracle (IFO) complexity of $\mathcal{O}(n+n^{1/2}\epsilon^{-1})$ for finding an $\epsilon$-approximate stationary point, which improves the deterministic ADMM by a factor $\mathcal{O}(n^{1/2})$, where $n$ denotes the sample size. As one of major contribution of this paper, we provide a new theoretical analysis framework for nonconvex stochastic ADMM methods with providing the optimal IFO complexity. Based on this new analysis framework, we study the unsolved optimal IFO complexity of the existing non-convex SVRG-ADMM and SAGA-ADMM methods, and prove they have the optimal IFO complexity of $\mathcal{O}(n+n^{2/3}\epsilon^{-1})$. Thus, the SPIDER-ADMM improves the existing stochastic ADMM methods by a factor of $\mathcal{O}(n^{1/6})$. Moreover, we extend SPIDER-ADMM to the online setting, and propose a faster online SPIDER-ADMM. Our theoretical analysis shows that the online SPIDER-ADMM has the IFO complexity of $\mathcal{O}(\epsilon^{-\frac{3}{2}})$, which improves the existing best results by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$. Finally, the experimental results on benchmark datasets validate that the proposed algorithms have faster convergence rate than the existing ADMM algorithms for nonconvex optimization.

[1]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[2]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[3]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[4]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[5]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[6]  Shiqian Ma,et al.  Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis , 2016, Computational Optimization and Applications.

[7]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[8]  Taiji Suzuki,et al.  Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers , 2014, ICML.

[9]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[10]  Alexander G. Gray,et al.  Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[11]  Leon Wenliang Zhong,et al.  Fast Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[12]  Feihu Huang,et al.  Stochastic Alternating Direction Method of Multipliers with Variance Reduction for Nonconvex Optimization , 2016, 1610.02758.

[13]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[14]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[15]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[16]  Jie Liu,et al.  Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.

[17]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[18]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[19]  Guoyin Li,et al.  Global Convergence of Splitting Methods for Nonconvex Composite Optimization , 2014, SIAM J. Optim..

[20]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[21]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[22]  James T. Kwok,et al.  Fast-and-Light Stochastic ADMM , 2016, IJCAI.

[23]  Zongben Xu,et al.  Convergence of multi-block Bregman ADMM for nonconvex composite problems , 2015, Science China Information Sciences.

[24]  s-taiji Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method , 2013 .

[25]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[26]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  James T. Kwok,et al.  Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity , 2016, ICML.

[28]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[29]  Mingrui Liu,et al.  ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization , 2017, NIPS.

[30]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[31]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[32]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[33]  James T. Kwok,et al.  Stochastic Variance-Reduced ADMM , 2016 .

[34]  Arindam Banerjee,et al.  Online Alternating Direction Method , 2012, ICML.

[35]  Enhong Chen,et al.  Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions , 2018, ICLR.

[36]  Quanquan Gu,et al.  Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..

[37]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[38]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[39]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[40]  Feihu Huang,et al.  Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization , 2018, ArXiv.

[41]  Artin,et al.  SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[42]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.