A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning: Principals, Recent Advances, and Applications

Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many signal processing and machine learning (ML) applications. It is used for solving optimization problems similarly to gradient-based methods. However, it does not require the gradient, using only function evaluations. Specifically, ZO optimization iteratively performs three major steps: gradient estimation, descent direction computation, and the solution update. In this article, we provide a comprehensive review of ZO optimization, with an emphasis on showing the underlying intuition, optimization principles, and recent advances in convergence analysis. Moreover, we demonstrate promising applications of ZO optimization, such as evaluating robustness and generating explanations from black-box deep learning (DL) models and efficient online sensor management.

[1]  Sivaraman Balakrishnan,et al.  Stochastic Zeroth-order Optimization in High Dimensions , 2017, AISTATS.

[2]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[3]  Cho-Jui Hsieh,et al.  A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.

[4]  Pu Zhao,et al.  Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent , 2020, AAAI.

[5]  Haishan Ye,et al.  Hessian-Aware Zeroth-Order Optimization for Black-Box Adversarial Attack , 2018, ArXiv.

[6]  J. Andrew Bagnell,et al.  Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective , 2019, AISTATS.

[7]  Mingyi Hong,et al.  signSGD via Zeroth-Order Oracle , 2019, ICLR.

[8]  Stephen P. Boyd,et al.  Sensor Selection via Convex Optimization , 2009, IEEE Transactions on Signal Processing.

[9]  David E. Cox,et al.  ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization , 2019, NeurIPS.

[10]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[12]  Nikolaos V. Sahinidis,et al.  Derivative-free optimization: a review of algorithms and comparison of software implementations , 2013, J. Glob. Optim..

[13]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[14]  Na Li,et al.  Distributed Zero-Order Algorithms for Nonconvex Multi-Agent optimization , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Georgios Piliouras,et al.  Efficiently avoiding saddle points with zero order methods: No gradients required , 2019, NeurIPS.

[16]  Jinfeng Yi,et al.  Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach , 2018, ICLR.

[17]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[18]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[19]  M. Powell A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation , 1994 .

[20]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[21]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[22]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[24]  Luís N. Vicente,et al.  PSwarm: a hybrid solver for linearly constrained global derivative-free optimization , 2009, Optim. Methods Softw..

[25]  Gang Niu,et al.  Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.

[26]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[27]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[28]  Flagot Yohannes Derivative free optimization methods , 2012 .

[29]  Tianlong Chen,et al.  Learning to Optimize in Swarms , 2019, NeurIPS.

[30]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[31]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[32]  Virginia Torczon,et al.  On the Convergence of the Multidirectional Search Algorithm , 1991, SIAM J. Optim..

[33]  Cho-Jui Hsieh,et al.  Sign-OPT: A Query-Efficient Hard-label Adversarial Attack , 2020, ICLR.

[34]  Mingyi Hong,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[35]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[36]  Georgios B. Giannakis,et al.  Bandit Convex Optimization for Scalable and Dynamic IoT Management , 2017, IEEE Internet of Things Journal.

[37]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[38]  Parikshit Ram,et al.  An ADMM Based Framework for AutoML Pipeline Configuration , 2020, AAAI.

[39]  Cho-Jui Hsieh,et al.  Learning to Learn by Zeroth-Order Oracle , 2020, ICLR.

[40]  Heng Huang,et al.  Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization , 2019, IJCAI.

[41]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[42]  Alexander J. Smola,et al.  Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.

[43]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[44]  Shiyu Chang,et al.  Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.

[45]  Wenbo Gao,et al.  ES-MAML: Simple Hessian-Free Meta Learning , 2020, ICLR.

[46]  Martin J. Wainwright,et al.  Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[47]  C. Kelley,et al.  The Simplex Gradient and Noisy Optimization Problems , 1998 .

[48]  Jinfeng Yi,et al.  A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks , 2018, AAAI.

[49]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[50]  Sijia Liu,et al.  On the Design of Black-Box Adversarial Examples by Leveraging Gradient-Free Optimization and Operator Splitting Method , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Shengyuan Xu,et al.  Zeroth-Order Method for Distributed Optimization With Approximate Projections , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[52]  K. Scheinberg,et al.  A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization , 2019, Foundations of Computational Mathematics.

[53]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[54]  Bin Gu,et al.  Zeroth-order Asynchronous Doubly Stochastic Algorithm with Variance Reduction , 2016, ArXiv.

[55]  Nicholas I. M. Gould,et al.  Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[56]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[57]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[58]  Anit Kumar Sahu,et al.  Distributed Zeroth Order Optimization Over Random Networks: A Kiefer-Wolfowitz Stochastic Approximation Approach , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[59]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[60]  Anit Kumar Sahu,et al.  Towards Gradient Free and Projection Free Stochastic Optimization , 2018, AISTATS.

[61]  Deniz Erdogmus,et al.  Structured Adversarial Attack: Towards General Implementation and Better Interpretability , 2018, ICLR.

[62]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[63]  Liu Liu,et al.  Stochastic Zeroth-order Optimization via Variance Reduction method , 2018, ArXiv.

[64]  Amit Dhurandhar,et al.  Model Agnostic Contrastive Explanations for Structured Data , 2019, ArXiv.

[65]  Xingyou Song,et al.  Gradientless Descent: High-Dimensional Zeroth-Order Optimization , 2020, ICLR.

[66]  Alfred O. Hero,et al.  Sensor Management: Past, Present, and Future , 2011, IEEE Sensors Journal.

[67]  Jinfeng Yi,et al.  AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[68]  Ohad Shamir,et al.  An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback , 2015, J. Mach. Learn. Res..

[69]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[70]  Daniel W. C. Ho,et al.  Distributed Randomized Gradient-Free Mirror Descent Algorithm for Constrained Optimization , 2019, IEEE Transactions on Automatic Control.

[71]  Mingyi Hong,et al.  ZONE: Zeroth-Order Nonconvex Multiagent Optimization Over Networks , 2017, IEEE Transactions on Automatic Control.

[72]  Krishnakumar Balasubramanian,et al.  Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points , 2018, Foundations of Computational Mathematics.

[73]  Krishnakumar Balasubramanian,et al.  Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , 2018, NeurIPS.

[74]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[75]  Tsung-Yi Ho,et al.  Transfer Learning without Knowing: Reprogramming Black-box Machine Learning Models with Scarce Data and Limited Resources , 2020, ICML.

[76]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[77]  Shiqian Ma,et al.  Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved Complexities , 2020, ArXiv.

[78]  J. Spall A Stochastic Approximation Technique for Generating Maximum Likelihood Parameter Estimates , 1987, 1987 American Control Conference.

[79]  Xiang Gao,et al.  On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , 2017, Journal of Scientific Computing.

[80]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[81]  s-taiji Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method , 2013 .

[82]  Krishnakumar Balasubramanian,et al.  Zeroth-order Optimization on Riemannian Manifolds , 2020, ArXiv.

[83]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[84]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Yi Zhou,et al.  Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization , 2019, ICML.

[86]  Sijia Liu,et al.  Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML , 2019, ArXiv.

[87]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.