Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks

Deep neural networks (DNN) have been shown to be notoriously brittle to small perturbations in their input data. This problem is analogous to the over-fitting problem in test-based program synthesis and automatic program repair, which is a consequence of the incomplete specification, i.e., the limited tests or training examples, that the program synthesis or repair algorithm has to learn from. Recently, test generation techniques have been successfully employed to augment existing specifications of intended program behavior, to improve the generalizability of program synthesis and repair. Inspired by these approaches, in this paper, we propose a technique that re-purposes software testing methods, specifically mutation-based fuzzing, to augment the training data of DNNs, with the objective of enhancing their robustness. Our technique casts the DNN data augmentation problem as an optimization problem. It uses genetic search to generate the most suitable variant of an input data to use for training the DNN, while simultaneously identifying opportunities to accelerate training by skipping augmentation in many instances. We instantiate this technique in two tools, Sensei and Sensei-SA, and evaluate them on 15 DNN models spanning 5 popular image data-sets. Our evaluation shows that Sensei can improve the robust accuracy of the DNN, compared to the state of the art, on each of the 15 models, by upto 11.9% and 5.5% on average. Further, Sensei-SA can reduce the average DNN training time by 25%, while still improving robust accuracy.

[1]  Qi Xin,et al.  Identifying test-suite-overfitted patches through test case generation , 2017, ISSTA.

[2]  Gang Huang,et al.  Identifying Patch Correctness in Test-Based Program Repair , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[3]  Claire Le Goues,et al.  Automated program repair , 2019, Commun. ACM.

[4]  Wen-Chuan Lee,et al.  MODE: automated neural network model debugging via state differential analysis and input selection , 2018, ESEC/SIGSOFT FSE.

[5]  Xiang Gao,et al.  Crash-avoiding program repair , 2019, ISSTA.

[6]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[7]  Rajeev Alur,et al.  Search-based program synthesis , 2018, Commun. ACM.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[10]  Ian Goodfellow,et al.  TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing , 2018, ICML.

[11]  Matias Martinez,et al.  Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system , 2018, Empirical Software Engineering.

[12]  Fanny Yang,et al.  Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness , 2019, NeurIPS.

[13]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[14]  David A. Wagner,et al.  MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples , 2017, ArXiv.

[15]  Sarfraz Khurshid,et al.  DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[17]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[18]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[19]  Isil Dillig,et al.  Program synthesis using conflict-driven learning , 2017, PLDI.

[20]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[21]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[25]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[26]  Olympia Roeva,et al.  Influence of the population size on the genetic algorithm performance in case of cultivation process modelling , 2013, 2013 Federated Conference on Computer Science and Information Systems.

[27]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[28]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[29]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[30]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[31]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[32]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[35]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[36]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[37]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[39]  Valentina Zantedeschi,et al.  Efficient Defenses Against Adversarial Attacks , 2017, AISec@CCS.

[40]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).