A Zeroth-Order Block Coordinate Descent Algorithm for Huge-Scale Black-Box Optimization

We consider the zeroth-order optimization problem in the huge-scale setting, where the dimension of the problem is so large that performing even basic vector operations on the decision variables is infeasible. In this paper, we propose a novel algorithm, coined ZO-BCD, that exhibits favorable overall query complexity and has a much smaller per-iteration computational complexity. In addition, we discuss how the memory footprint of ZO-BCD can be reduced even further by the clever use of circulant measurement matrices. As an application of our new method, we propose the idea of crafting adversarial attacks on neural network based classifiers in a wavelet domain, which can result in problem dimensions of over one million. In particular, we show that crafting adversarial examples to audio classifiers in a wavelet domain can achieve the state-of-the-art attack success rate of 97.9% with significantly less distortion.

[1]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[3]  James C. Spall,et al.  AN OVERVIEW OF THE SIMULTANEOUS PERTURBATION METHOD FOR EFFICIENT OPTIMIZATION , 1998 .

[4]  S. Mallat A wavelet tour of signal processing , 1998 .

[5]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[6]  Sofia C. Olhede,et al.  Generalized Morse wavelets , 2002, IEEE Trans. Signal Process..

[7]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Sofia C. Olhede,et al.  On the Analytic Wavelet Transform , 2007, IEEE Transactions on Information Theory.

[10]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[11]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[12]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[13]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[14]  Arumugam Manthiram,et al.  Microwave-assisted Low-temperature Growth of Thin Films in Solution , 2012, Scientific reports.

[15]  Holger Rauhut,et al.  Suprema of Chaos Processes and the Restricted Isometry Property , 2012, ArXiv.

[16]  S. Osher,et al.  Image restoration: Total variation, wavelet frames, and beyond , 2012 .

[17]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[18]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[19]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[20]  Tara N. Sainath,et al.  Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.

[21]  Cho-Jui Hsieh,et al.  A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order , 2016, NIPS.

[22]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Peter Richtárik,et al.  Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[24]  Facebook,et al.  Houdini : Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples , 2017 .

[25]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[26]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[27]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[28]  Krishnakumar Balasubramanian,et al.  Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , 2018, NeurIPS.

[29]  Mani B. Srivastava,et al.  Did you hear that? Adversarial Examples Against Automatic Speech Recognition , 2018, ArXiv.

[30]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[31]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[32]  Sivaraman Balakrishnan,et al.  Stochastic Zeroth-order Optimization in High Dimensions , 2017, AISTATS.

[33]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[34]  Holger Rauhut,et al.  Improved bounds for sparse recovery from subsampled random convolutions , 2016, The Annals of Applied Probability.

[35]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[36]  Meng Huang,et al.  Improved bounds for the RIP of Subsampled Circulant matrices , 2018, Sampling Theory in Signal and Image Processing.

[37]  Atil Iscen,et al.  Provably Robust Blackbox Optimization for Reinforcement Learning , 2019, CoRL.

[38]  Hans-Georg Beyer,et al.  Large Scale Black-Box Optimization by Limited-Memory Matrix Adaptation , 2019, IEEE Transactions on Evolutionary Computation.

[39]  Seyed-Mohsen Moosavi-Dezfooli,et al.  SparseFool: A Few Pixels Make a Big Difference , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  David E. Cox,et al.  ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization , 2019, NeurIPS.

[41]  K. Scheinberg,et al.  A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization , 2019, Foundations of Computational Mathematics.

[42]  Nikita Vemuri,et al.  Targeted Adversarial Examples for Black Box Audio Systems , 2018, 2019 IEEE Security and Privacy Workshops (SPW).

[43]  Roberto Santana,et al.  Universal adversarial examples in speech command classification , 2019, ArXiv.

[44]  Mani Srivastava,et al.  GenAttack: practical black-box attacks with gradient-free optimization , 2018, GECCO.

[45]  A One-bit, Comparison-Based Gradient Estimator , 2020, 2010.02479.

[46]  Mayank Vatsa,et al.  WaveTransform: Crafting Adversarial Examples via Input Decomposition , 2020, ECCV Workshops.

[47]  Wotao Yin,et al.  Zeroth-Order Regularized Optimization (ZORO): Approximately Sparse Gradients and Adaptive Sampling , 2020, SIAM J. Optim..

[48]  D. Golovin,et al.  Gradientless Descent: High-Dimensional Zeroth-Order Optimization , 2019, ICLR.

[49]  Naveed Akhtar,et al.  Steganographic universal adversarial perturbations , 2020, Pattern Recognit. Lett..

[50]  Jian Liu,et al.  AdvPulse: Universal, Synchronization-free, and Targeted Audio Adversarial Attacks via Subsecond Perturbations , 2020, CCS.

[51]  Jian Liu,et al.  Enabling Fast and Universal Audio Adversarial Attack Using Generative Model , 2020, AAAI.

[52]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .