ZEROTH-ORDER STOCHASTIC PROJECTED GRADIENT DESCENT FOR NONCONVEX OPTIMIZATION

In this paper, we analyze the convergence of the zeroth-order stochastic projected gradient descent (ZO-SPGD) method for constrained convex and nonconvex optimization scenarios where only objective function values (not gradients) are directly available. We show statistical properties of a new random gradient estimator, constructed through random direction samples drawn from a bounded uniform distribution. We prove that ZO-SPGD yields a $O\left( {\frac{d}{{bq\sqrt T }} + \frac{1}{{\sqrt T }}} \right)$ convergence rate for convex but non-smooth optimization, where d is the number of optimization variables, b is the minibatch size, q is the number of random direction samples for gradient estimation, and T is the number of iterations. For nonconvex optimization, we show that ZO-SPGD achieves $O\left( {\frac{1}{{\sqrt T }}} \right)$ convergence rate but suffers an additional $O\left( {\frac{{d + q}}{{bq}}} \right)$ error. Our the oretical investigation on ZO-SPGD provides a general framework to study the convergence rate of zeroth-order algorithms.

[1]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[2]  Mingyi Hong,et al.  Zeroth Order Nonconvex Multi-Agent Optimization over Networks , 2017 .

[3]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[4]  A. Hero,et al.  Supplementary Material for Zeroth-Order Online Alternating Direction Method of Multipliers : Convergence Analysis and Applications , 2018 .

[5]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[6]  Shiyu Chang,et al.  Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.

[7]  Xiang Gao,et al.  On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , 2017, Journal of Scientific Computing.

[8]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[9]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[10]  Mingyi Hong,et al.  ZENITH : A Zeroth-Order Distributed Algorithm for Multi-Agent Nonconvex Optimization , 2016 .

[11]  Liu Liu,et al.  Stochastic Zeroth-order Optimization via Variance Reduction method , 2018, ArXiv.

[12]  Jinfeng Yi,et al.  AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[13]  Ohad Shamir,et al.  An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback , 2015, J. Mach. Learn. Res..

[14]  Prateek Jain,et al.  Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..

[15]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[16]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[17]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[18]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[19]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[20]  Karan Singh,et al.  Efficient Regret Minimization in Non-Convex Games , 2017, ICML.

[21]  Cho-Jui Hsieh,et al.  A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.