SpecAttack: Specification-Based Adversarial Training for Deep Neural Networks

Safety specification-based adversarial training aims to generate examples violating a formal safety specification and therefore provides approaches for repair. The need for maintaining high prediction accuracy while ensuring the save behavior remains challenging. Thus we present SpecAttack, a queryefficient counter-example generation and repair method for deep neural networks. Using SpecAttack allows specifying safety constraints on the model to find inputs that violate these constraints. These violations are then used to repair the neural network via re-training such that it becomes provably safe. We evaluate SpecAttack’s performance on the task of counter-example generation and repair. Our experimental evaluation demonstrates that SpecAttack is in most cases more query-efficient than comparable attacks, yields counterexamples of higher quality, with its repair technique being more efficient, maintaining higher functional correctness, and provably guaranteeing safety specification compliance. Introduction Deep neural networks (DNNs) are increasingly applied in safety-critical domains, such as self-driving cars, unmanned aircraft, medical diagnosis, and face recognition based security protocols. Due to this widespread adoption, it is even more important to ensure that these neural networks behave as expected–that is, with respect to a formal safety specification. Unfortunately, it has been shown that neural networks are vulnerable to–sometimes intentionally crafted– adversarial examples. These inputs, also called counterexamples, cause the neural network to show unsafe behavior. Finding inputs leading to such an error is thus necessary to identify the limitations of current machine learning models and suggest ways to provably repair these models. In this paper, we focus on the efficient generation of highquality counter-examples, as well as leveraging them to perform a repair that produces provably safe neural networks. This approach is related to adversarial training as performed by (Goodfellow, Shlens, and Szegedy 2015; Madry et al. 2018), but considers formal safety specifications instead of a defence against robustness attacks. Therefore, we discuss three threads of work towards counter-example generation and repair of DNNs in the context of formal safety specifications: Verification. The formal method community has come up with techniques that verify a neural network’s safety in a 0 1 2 3 4 5 ·10 −50 0 50 100 150

[1]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[2]  Guy Katz,et al.  Minimal Modifications of Deep Neural Networks using Verification , 2020, LPAR.

[3]  Mykel J. Kochenderfer,et al.  The Marabou Framework for Verification and Analysis of Deep Neural Networks , 2019, CAV.

[4]  Xiang,et al.  Efficiency of generalized simulated annealing , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[5]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[6]  Artem Babenko,et al.  Editable Neural Networks , 2020, ICLR.

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  Mislav Balunovic,et al.  DL2: Training and Querying Neural Networks with Logic , 2019, ICML.

[9]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[10]  Alice E. Smith,et al.  Penalty functions , 1996 .

[11]  Matthew Sotoudeh,et al.  Computing Linear Restrictions of Neural Networks , 2019, NeurIPS.

[12]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[13]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14]  R. Storn,et al.  Differential Evolution , 2004 .

[15]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[16]  Carl Sandrock,et al.  A simplicial homology algorithm for Lipschitz optimisation , 2018, Journal of Global Optimization.

[17]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[18]  Hyun Oh Song,et al.  Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization , 2019, ICML.

[19]  Timon Gehr,et al.  An abstract domain for certifying neural networks , 2019, Proc. ACM Program. Lang..

[20]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[21]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[22]  Mykel J. Kochenderfer,et al.  Deep Neural Network Compression for Aircraft Collision Avoidance Systems , 2018, Journal of Guidance, Control, and Dynamics.

[23]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[24]  Michael I. Jordan,et al.  HopSkipJumpAttack: A Query-Efficient Decision-Based Attack , 2019, 2020 IEEE Symposium on Security and Privacy (SP).

[25]  Panagiotis Kouvaros,et al.  Efficient Verification of ReLU-Based Neural Networks via Dependency Analysis , 2020, AAAI.

[26]  Amarda Shehu,et al.  Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules , 2012, Adv. Artif. Intell..

[27]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..