Why the Failure? How Adversarial Examples Can Provide Insights for Interpretable Machine Learning

Recent advances in Machine Learning (ML) have profoundly changed many detection, classification, recognition and inference tasks. Given the complexity of the battlespace, ML has the potential to revolutionise how Coalition Situation Understanding is synthesised and revised. However, many issues must be overcome before its widespread adoption. In this paper we consider two - interpretability and adversarial attacks. Interpretability is needed because military decision-makers must be able to justify their decisions. Adversarial attacks arise because many ML algorithms are very sensitive to certain kinds of input perturbations. In this paper, we argue that these two issues are conceptually linked, and insights in one can provide insights in the other. We illustrate these ideas with relevant examples from the literature and our own experiments.

[1]  Michael Backes,et al.  How Wrong Am I? - Studying Adversarial Examples and their Impact on Uncertainty in Gaussian Process Machine Learning Models , 2017, ArXiv.

[2]  Hang Su,et al.  Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples , 2017, ArXiv.

[3]  Zoubin Ghahramani,et al.  Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks , 2017, 1707.02476.

[4]  Mani B. Srivastava,et al.  Did you hear that? Adversarial Examples Against Automatic Speech Recognition , 2018, ArXiv.

[5]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[6]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[7]  Alun Preece,et al.  Integrating learning and reasoning services for explainable information fusion , 2018 .

[8]  Rui Zhang,et al.  Secure and resilient distributed machine learning under adversarial environments , 2015, 2015 18th International Conference on Information Fusion (Fusion).

[9]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.

[10]  Ananthram Swami,et al.  The Internet of Battle Things , 2016, Computer.

[11]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[12]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Martin Wistuba,et al.  Harnessing Model Uncertainty for Detecting Adversarial Examples , 2017 .

[14]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[15]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[17]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[20]  Amit Dhurandhar,et al.  A Formal Framework to Characterize Interpretability of Procedures , 2017, ArXiv.

[21]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[22]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[23]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[25]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Samuel Ritter,et al.  Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[28]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[29]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[30]  Rui Zhang,et al.  A game-theoretic analysis of label flipping attacks on distributed support vector machines , 2017, 2017 51st Annual Conference on Information Sciences and Systems (CISS).

[31]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[32]  David A. Forsyth,et al.  Standard detectors aren't (currently) fooled by physical adversarial stop signs , 2017, ArXiv.

[33]  Alun D. Preece,et al.  Deep learning for situational understanding , 2017, 2017 20th International Conference on Information Fusion (Fusion).

[34]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[35]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[37]  Prateek Mittal,et al.  Rogue Signs: Deceiving Traffic Sign Recognition with Malicious Ads and Logos , 2018, ArXiv.

[38]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[39]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[40]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[41]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[42]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[43]  Joanna Masel,et al.  Outcome Orientation: A Misconception of Probability That Harms Medical Research and Practice , 2017, Perspectives in biology and medicine.

[44]  Mica R. Endsley,et al.  Toward a Theory of Situation Awareness in Dynamic Systems , 1995, Hum. Factors.

[45]  Cesare Stefanelli,et al.  Analyzing the applicability of Internet of Things to the battlefield environment , 2016, 2016 International Conference on Military Communications and Information Systems (ICMCIS).

[46]  Dawn Xiaodong Song,et al.  Adversarial Examples for Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[47]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[48]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[49]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.