KATANA: Simple Post-Training Robustness Using Test Time Augmentations

Although Deep Neural Networks (DNNs) achieve excellent performance on many real-world tasks, they are highly vulnerable to adversarial attacks. A leading defense against such attacks is adversarial training, a technique in which a DNN is trained to be robust to adversarial attacks by introducing adversarial noise to its input. This procedure is effective but must be done during the training phase. In this work, we propose a new simple and easy-to-use technique, KATANA, for robustifying an existing pretrained DNN without modifying its weights. For every image, we generate N randomized Test Time Augmentations (TTAs) by applying diverse color, blur, noise, and geometric transforms. Next, we utilize the DNN’s logits output to train a simple random forest classifier to predict the real class label. Our strategy achieves state-of-the-art adversarial robustness on diverse attacks with minimal compromise on the natural images’ classification. We test KATANA also against two adaptive white-box attacks and it shows excellent results when combined with adversarial training. Code is available in https://github.com/giladcohen/KATANA.

[1]  Aleksander Madry,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[2]  Chaoning Zhang,et al.  Understanding Adversarial Examples From the Mutual Influence of Images and Perturbations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Adi Shamir,et al.  The Dimpled Manifold Model of Adversarial Examples in Machine Learning , 2021, ArXiv.

[4]  Meikang Qiu,et al.  FenceBox: A Platform for Defeating Adversarial Examples with Data Augmentation Techniques , 2020, ArXiv.

[5]  Yew-Soon Ong,et al.  What It Thinks Is Important Is Important: Robustness Transfers Through Input Gradients , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[7]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[8]  Guillermo Sapiro,et al.  Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[10]  Abhimanyu Dubey,et al.  Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[12]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[15]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[18]  Debdeep Mukhopadhyay,et al.  Adversarial Attacks and Defences: A Survey , 2018, ArXiv.

[19]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[20]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[21]  Bernt Schiele,et al.  Disentangling Adversarial Robustness and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[23]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[24]  Yu Cheng,et al.  Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[26]  Michelle Karg,et al.  Learn2Perturb: An End-to-End Feature Perturbation Learning to Improve Adversarial Robustness , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[28]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[29]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[30]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[31]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[34]  Alexei A. Efros,et al.  Test-Time Training with Self-Supervision for Generalization under Distribution Shifts , 2019, ICML.

[35]  Thomas Hofmann,et al.  The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[36]  Terrance E. Boult,et al.  Towards Robust Deep Neural Networks with BANG , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37]  Uri Shaham,et al.  Understanding adversarial training: Increasing local stability of supervised models through robust optimization , 2015, Neurocomputing.

[38]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[39]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[40]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[42]  Chawin Sitawarin,et al.  Defending Against Adversarial Examples with K-Nearest Neighbor , 2019, ArXiv.

[43]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[44]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[45]  Hang Su,et al.  Boosting Adversarial Training with Hypersphere Embedding , 2020, NeurIPS.

[46]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[47]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[48]  Di He,et al.  Adversarially Robust Generalization Just Requires More Unlabeled Data , 2019, ArXiv.

[49]  Raja Giryes,et al.  Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization , 2018, ECCV.

[50]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[51]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[52]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[53]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[54]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[55]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[57]  George Kesidis,et al.  Adversarial Learning Targeting Deep Neural Network Classification: A Comprehensive Review of Defenses Against Attacks , 2020, Proceedings of the IEEE.

[58]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[59]  Ting Chen,et al.  Robust Pre-Training by Adversarial Contrastive Learning , 2020, NeurIPS.

[60]  Tara Javidi,et al.  Safe Machine Learning and Defeating Adversarial Attacks , 2018, IEEE Security & Privacy.