Adversarially Robust Classification Based on GLRT

Machine learning models are vulnerable to adversarial attacks that can often cause misclassification by introducing small but well designed perturbations. In this paper, we explore, in the setting of classical composite hypothesis testing, a defense strategy based on the generalized likelihood ratio test (GLRT), which jointly estimates the class of interest and the adversarial perturbation. We evaluate the GLRT approach for the special case of binary hypothesis testing in white Gaussian noise under $\ell_{\infty}$ norm-bounded adversarial perturbations, a setting for which a minimax strategy optimizing for the worst-case attack is known. We show that the GLRT approach yields performance competitive with that of the minimax approach under the worst-case attack, and observe that it yields a better robustness-accuracy trade-off under weaker attacks, depending on the values of signal components relative to the attack budget. We also observe that the GLRT defense generalizes naturally to more complex models for which optimal minimax classifiers are not known.

[1]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[3]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[4]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[5]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[6]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[7]  Upamanyu Madhow,et al.  Polarizing Front Ends for Robust Cnns , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[9]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[10]  Provable tradeoffs in adversarially robust classification , 2020, ArXiv.

[11]  J. G. Gander,et al.  An introduction to signal detection and estimation , 1990 .

[12]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[13]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[14]  Upamanyu Madhow,et al.  Sparsity-based Defense Against Adversarial Attacks on Linear Classifiers , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[15]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[16]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[19]  Daniel Cullina,et al.  Lower Bounds on Adversarial Robustness from Optimal Transport , 2019, NeurIPS.