Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity

Zeroth-order (gradient-free) method is a class of powerful optimization tool for many machine learning problems because it only needs function values (not gradient) in the optimization. In particular, zeroth-order method is very suitable for many complex problems such as black-box attacks and bandit feedback, whose explicit gradients are difficult or infeasible to obtain. Recently, although many zeroth-order methods have been developed, these approaches still exist two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a novel fast zeroth-order stochastic alternating direction method of multipliers (ADMM) method (\emph{i.e.}, ZO-SPIDER-ADMM) with lower function query complexity for solving nonconvex problems with multiple nonsmooth penalties. Moreover, we prove that our ZO-SPIDER-ADMM has the optimal function query complexity of $O(dn + dn^{\frac{1}{2}}\epsilon^{-1})$ for finding an $\epsilon$-approximate local solution, where $n$ and $d$ denote the sample size and dimension of data, respectively. In particular, the ZO-SPIDER-ADMM improves the existing best nonconvex zeroth-order ADMM methods by a factor of $O(d^{\frac{1}{3}}n^{\frac{1}{6}})$. Moreover, we propose a fast online ZO-SPIDER-ADMM (\emph{i.e.,} ZOO-SPIDER-ADMM). Our theoretical analysis shows that the ZOO-SPIDER-ADMM has the function query complexity of $O(d\epsilon^{-\frac{3}{2}})$, which improves the existing best result by a factor of $O(\epsilon^{-\frac{1}{2}})$. Finally, we utilize a task of structured adversarial attack on black-box deep neural networks to demonstrate the efficiency of our algorithms.

[1]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[2]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[3]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2015, ICASSP.

[4]  Alfred O. Hero,et al.  Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications , 2017, AISTATS.

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Taiji Suzuki,et al.  Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers , 2014, ICML.

[7]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[8]  Yi Zhou,et al.  Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization , 2019, ICML.

[9]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[10]  Yuanyuan Liu,et al.  Accelerated Variance Reduced Stochastic ADMM , 2017, AAAI.

[11]  Cho-Jui Hsieh,et al.  A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.

[12]  Mingyi Hong,et al.  Zeroth Order Nonconvex Multi-Agent Optimization over Networks , 2017 .

[13]  Feihu Huang,et al.  Stochastic Alternating Direction Method of Multipliers with Variance Reduction for Nonconvex Optimization , 2016, 1610.02758.

[14]  Heng Huang,et al.  Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization , 2020, ICML.

[15]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[16]  Xiang Gao,et al.  On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , 2017, Journal of Scientific Computing.

[17]  James T. Kwok,et al.  Fast-and-Light Stochastic ADMM , 2016, IJCAI.

[18]  Mingrui Liu,et al.  ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization , 2017, NIPS.

[19]  Bin Gu,et al.  Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization , 2019, AAAI.

[20]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[21]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[22]  Deniz Erdogmus,et al.  Structured Adversarial Attack: Towards General Implementation and Better Interpretability , 2018, ICLR.

[23]  Shiqian Ma,et al.  Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis , 2016, Computational Optimization and Applications.

[24]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[25]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[26]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[27]  Krishnakumar Balasubramanian,et al.  Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , 2018, NeurIPS.

[28]  Heng Huang,et al.  Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization , 2019, IJCAI.

[29]  Bin Gu,et al.  Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines , 2018, ICML.

[30]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[31]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[32]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[33]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[34]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[35]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[36]  Zongben Xu,et al.  Convergence of multi-block Bregman ADMM for nonconvex composite problems , 2015, Science China Information Sciences.

[37]  Shiyu Chang,et al.  Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.

[38]  Richard E. Turner,et al.  Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[39]  Liu Liu,et al.  Stochastic Zeroth-order Optimization via Variance Reduction method , 2018, ArXiv.

[40]  Alexander G. Gray,et al.  Stochastic Alternating Direction Method of Multipliers , 2013, ICML.