Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness

Mode connectivity provides novel geometric insights on analyzing loss landscapes and enables building high-accuracy pathways between well-trained neural networks. In this work, we propose to employ mode connectivity in loss landscapes to study the adversarial robustness of deep neural networks, and provide novel methods for improving this robustness. Our experiments cover various types of adversarial attacks applied to different network architectures and datasets. When network models are tampered with backdoor or error-injection attacks, our results demonstrate that the path connection learned using limited amount of bonafide data can effectively mitigate adversarial effects while maintaining the original accuracy on clean data. Therefore, mode connectivity provides users with the power to repair backdoored or error-injected models. We also use mode connectivity to investigate the loss landscapes of regular and robust models against evasion attacks. Experiments show that there exists a barrier in adversarial robustness loss on the path connecting regular and adversarially-trained models. A high correlation is observed between the adversarial robustness loss and the largest eigenvalue of the input Hessian matrix, for which theoretical justifications are provided. Our results suggest that mode connectivity offers a holistic tool and practical means for evaluating and improving adversarial robustness.

[1]  Stefano Soatto,et al.  Empirical Study of the Topology and Geometry of Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Chenchen Liu,et al.  Interpreting Adversarial Robustness: A View from Decision Surface in Input Space , 2018, ArXiv.

[3]  Zhanxing Zhu,et al.  Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.

[4]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[5]  Huan Wang,et al.  Identifying Generalization Properties in Neural Networks , 2018, ArXiv.

[6]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[7]  Ilya P. Razenshteyn,et al.  Adversarial examples from computational constraints , 2018, ICML.

[8]  Yanzhi Wang,et al.  An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks , 2018, ACM Multimedia.

[9]  Qiang Xu,et al.  Fault injection attack on deep neural network , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[11]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[12]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[13]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[15]  Elvis Dohmatob,et al.  Limitations of adversarial robustness: strong No Free Lunch Theorem , 2018, ArXiv.

[16]  Seyed-Mohsen Moosavi-Dezfooli,et al.  The Robustness of Deep Networks: A Geometrical Perspective , 2017, IEEE Signal Processing Magazine.

[17]  Ilya Sutskever,et al.  Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.

[18]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Luiz Velho,et al.  Computer Graphics: Theory and Practice , 2012 .

[21]  Rida T. Farouki,et al.  The Bernstein polynomial basis: A centennial retrospective , 2012, Comput. Aided Geom. Des..

[22]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[23]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[24]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[25]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[26]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[27]  Alessandro Barenghi,et al.  Fault Injection Attacks on Cryptographic Devices: Theory, Practice, and Countermeasures , 2012, Proceedings of the IEEE.

[28]  Sijia Liu,et al.  On the Design of Black-Box Adversarial Examples by Leveraging Gradient-Free Optimization and Operator Splitting Method , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[30]  Pu Zhao,et al.  Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent , 2020, AAAI.

[31]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[32]  Yanick Fratantonio,et al.  Drammer: Deterministic Rowhammer Attacks on Mobile Platforms , 2016, CCS.

[33]  Fred A. Hamprecht,et al.  Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.

[34]  Edgar A. Bernal,et al.  Towards Robust Deep Neural Networks , 2018, ArXiv.

[35]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[36]  Jinfeng Yi,et al.  Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[37]  Xiao Wang,et al.  Defending DNN Adversarial Attacks with Pruning and Logits Augmentation , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[38]  Deniz Erdogmus,et al.  Structured Adversarial Attack: Towards General Implementation and Better Interpretability , 2018, ICLR.

[39]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[40]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[41]  Richard Socher,et al.  Using Mode Connectivity for Loss Landscape Analysis , 2018, ArXiv.

[42]  Xiao Wang,et al.  Defensive dropout for hardening deep neural networks under adversarial attacks , 2018, ICCAD.

[43]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[44]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[45]  Siddharth Garg,et al.  BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.