论文信息 - Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity - 字舞流文

Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity

Zeroth-order (gradient-free) method is a class of powerful optimization tool for many machine learning problems because it only needs function values (not gradient) in the optimization. In particular, zeroth-order method is very suitable for many complex problems such as black-box attacks and bandit feedback, whose explicit gradients are difficult or infeasible to obtain. Recently, although many zeroth-order methods have been developed, these approaches still exist two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a novel fast zeroth-order stochastic alternating direction method of multipliers (ADMM) method (\emph{i.e.}, ZO-SPIDER-ADMM) with lower function query complexity for solving nonconvex problems with multiple nonsmooth penalties. Moreover, we prove that our ZO-SPIDER-ADMM has the optimal function query complexity of $O(dn + dn^{\frac{1}{2}}\epsilon^{-1})$ for finding an $\epsilon$-approximate local solution, where $n$ and $d$ denote the sample size and dimension of data, respectively. In particular, the ZO-SPIDER-ADMM improves the existing best nonconvex zeroth-order ADMM methods by a factor of $O(d^{\frac{1}{3}}n^{\frac{1}{6}})$. Moreover, we propose a fast online ZO-SPIDER-ADMM (\emph{i.e.,} ZOO-SPIDER-ADMM). Our theoretical analysis shows that the ZOO-SPIDER-ADMM has the function query complexity of $O(d\epsilon^{-\frac{3}{2}})$, which improves the existing best result by a factor of $O(\epsilon^{-\frac{1}{2}})$. Finally, we utilize a task of structured adversarial attack on black-box deep neural networks to demonstrate the efficiency of our algorithms.

Jian Pei | Feihu Huang | Heng Huang | Shangqian Gao | J. Pei | Heng Huang | Feihu Huang | Shangqian Gao

[1] Wotao Yin,et al. Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[2] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[3] Zhi-Quan Luo,et al. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2015, ICASSP.

[4] Alfred O. Hero,et al. Zeroth-Order Online Alternating Direction Method of Multipliers: Convergence Analysis and Applications , 2017, AISTATS.

[5] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6] Taiji Suzuki,et al. Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers , 2014, ICML.

[7] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[8] Yi Zhou,et al. Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization , 2019, ICML.

[9] Yi Zhou,et al. SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[10] Yuanyuan Liu,et al. Accelerated Variance Reduced Stochastic ADMM , 2017, AAAI.

[11] Cho-Jui Hsieh,et al. A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.

[12] Mingyi Hong,et al. Zeroth Order Nonconvex Multi-Agent Optimization over Networks , 2017 .

[13] Feihu Huang,et al. Stochastic Alternating Direction Method of Multipliers with Variance Reduction for Nonconvex Optimization , 2016, 1610.02758.

[14] Heng Huang,et al. Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization , 2020, ICML.

[15] Michael I. Jordan,et al. A General Analysis of the Convergence of ADMM , 2015, ICML.

[16] Xiang Gao,et al. On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , 2017, Journal of Scientific Computing.

[17] James T. Kwok,et al. Fast-and-Light Stochastic ADMM , 2016, IJCAI.

[18] Mingrui Liu,et al. ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization , 2017, NIPS.

[19] Bin Gu,et al. Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization , 2019, AAAI.

[20] B. Mercier,et al. A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[21] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[22] Deniz Erdogmus,et al. Structured Adversarial Attack: Towards General Implementation and Better Interpretability , 2018, ICLR.

[23] Shiqian Ma,et al. Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis , 2016, Computational Optimization and Applications.

[24] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[25] Yurii Nesterov,et al. Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[26] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[27] Krishnakumar Balasubramanian,et al. Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , 2018, NeurIPS.

[28] Heng Huang,et al. Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization , 2019, IJCAI.

[29] Bin Gu,et al. Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines , 2018, ICML.

[30] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[31] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[32] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[33] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[34] Alexander J. Smola,et al. Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[35] Eric P. Xing,et al. A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[36] Zongben Xu,et al. Convergence of multi-block Bregman ADMM for nonconvex composite problems , 2015, Science China Information Sciences.

[37] Shiyu Chang,et al. Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.

[38] Richard E. Turner,et al. Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[39] Liu Liu,et al. Stochastic Zeroth-order Optimization via Variance Reduction method , 2018, ArXiv.

[40] Alexander G. Gray,et al. Stochastic Alternating Direction Method of Multipliers , 2013, ICML.