Domain Knowledge Alleviates Adversarial Attacks in Multi-Label Classifiers

Adversarial attacks on machine learning-based classifiers, along with defense mechanisms, have been widely studied in the context of single-label classification problems. In this paper, we shift the attention to multi-label classification, where the availability of domain knowledge on the relationships among the considered classes may offer a natural way to spot incoherent predictions, i.e., predictions associated to adversarial examples lying outside of the training data distribution. We explore this intuition in a framework in which first-order logic knowledge is converted into constraints and injected into a semi-supervised learning problem. Within this setting, the constrained classifier learns to fulfill the domain knowledge over the marginal distribution, and can naturally reject samples with incoherent predictions. Even though our method does not exploit any knowledge of attacks during training, our experimental analysis surprisingly unveils that domain-knowledge constraints can help detect adversarial examples effectively, especially if such constraints are not known to the attacker. We show how to implement an adaptive attack exploiting knowledge of the constraints and, in a specifically-designed setting, we provide experimental comparisons with popular state-of-the-art attacks. We believe that our approach may provide a significant step towards designing more robust multi-label classifiers.

[1]  M. Gori,et al.  Entropy-based Logic Explanations of Neural Networks , 2021, AAAI.

[2]  Patrick McDaniel,et al.  On the Robustness of Domain Constraints , 2021, CCS.

[3]  M. Gori,et al.  Human-Driven FOL Explanations of Deep Learning , 2020, IJCAI.

[4]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[5]  George Kesidis,et al.  Adversarial Learning Targeting Deep Neural Network Classification: A Comprehensive Review of Defenses Against Attacks , 2020, Proceedings of the IEEE.

[6]  Stefano Teso,et al.  Does Symbolic Knowledge Prevent Adversarial Fooling? , 2019, ArXiv.

[7]  Nicolas Flammarion,et al.  Square Attack: a query-efficient black-box adversarial attack via random search , 2019, ECCV.

[8]  G. Fumera,et al.  Deep neural rejection against adversarial examples , 2019, EURASIP Journal on Information Security.

[9]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[10]  Di He,et al.  Adversarially Robust Generalization Just Requires More Unlabeled Data , 2019, ArXiv.

[11]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[12]  Po-Sen Huang,et al.  Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[13]  Fahad Shahbaz Khan,et al.  Cross-Domain Transferability of Adversarial Perturbations , 2019, NeurIPS.

[14]  Luciano Serafini,et al.  Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning , 2019, FLAP.

[15]  Amir Najafi,et al.  Robustness to Adversarial Perturbations in Learning from Incomplete Data , 2019, NeurIPS.

[16]  Chinmay Hegde,et al.  Semantic Adversarial Attacks: Parametric Transformations That Fool Deep Classifiers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Y. Chevaleyre,et al.  Robust Neural Networks using Randomized Adversarial Training , 2019, ArXiv.

[18]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[19]  Matthias Hein,et al.  Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xiao Huang,et al.  Multi-label Adversarial Perturbations , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[21]  Fabio Roli,et al.  Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks , 2018, USENIX Security Symposium.

[22]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[23]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[24]  Patrick D. McDaniel,et al.  Making machine learning robust against adversarial inputs , 2018, Commun. ACM.

[25]  Toby P. Breckon,et al.  GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training , 2018, ACCV.

[26]  Stefano Melacci,et al.  Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources , 2018, LREC.

[27]  Bernhard Schölkopf,et al.  Adversarial Extreme Multi-label Classification , 2018, ArXiv.

[28]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[29]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[30]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[31]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[32]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[33]  David Bamman,et al.  Adversarial Training for Relation Extraction , 2017, EMNLP.

[34]  Gavin Brown,et al.  Is Deep Learning Safe for Robot Vision? Adversarial Examples Against the iCub Humanoid , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[35]  Zhongfei Zhang,et al.  Boosted Zero-Shot Learning with Semantic Correlation Regularization , 2017, IJCAI.

[36]  Il-Chul Moon,et al.  Adversarial Dropout for Supervised and Semi-supervised Learning , 2017, AAAI.

[37]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[38]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[39]  Artur S. d'Avila Garcez,et al.  Logic Tensor Networks for Semantic Image Interpretation , 2017, IJCAI.

[40]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[41]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Nuno Vasconcelos,et al.  Semantically Consistent Regularization for Zero-Shot Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Marco Gori,et al.  Semantic-based regularization for learning and inference , 2017, Artif. Intell..

[44]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[45]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[46]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[47]  Kevin Gimpel,et al.  Early Methods for Detecting Adversarial Images , 2016, ICLR.

[48]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[49]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[51]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[52]  Marcello Sanguineti,et al.  Foundations of Support Constraint Machines , 2015, Neural Computation.

[53]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[54]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[55]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[56]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[57]  Marco Gori,et al.  Constraint Verification With Kernel Machines , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..