D-Score: A White-Box Diagnosis Score for CNNs Based on Mutation Operators

Convolutional neural networks (CNNs) have been widely applied in many safety-critical domains, such as autonomous driving and medical diagnosis. However, concerns have been raised with respect to the trustworthiness of these models: The standard testing method evaluates the performance of a model on a test set, while low-quality and insufficient test sets can lead to unreliable evaluation results, which can have unforeseeable consequences. Therefore, how to comprehensively evaluate CNNs and, based on the evaluation results, how to enhance their trustworthiness are the key problems to be urgently addressed. Prior work has used mutation tests to evaluate the test sets of CNNs. However, the evaluation scores are black boxes and not explicit enough for what is being tested. In this paper, we propose a white-box diagnostic approach that uses mutation operators and image transformation to calculate the feature and attention distribution of the model and further present a diagnosis score, namely D-Score, to reflect the model's robustness and fitness to a dataset. We also propose a D-Score based data augmentation method to enhance the CNN's performance to translations and rescalings. Comprehensive experiments on two widely used datasets and three commonly adopted CNNs demonstrate the effectiveness of our approach.

[1]  Paolo Tonella,et al.  DeepCrime: mutation testing of deep learning systems based on real faults , 2021, ISSTA.

[2]  Alagan Anpalagan,et al.  Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues , 2021, Array.

[3]  Annibale Panichella,et al.  What Are We Really Testing in Mutation Testing for Machine Learning? A Critical Reflection , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER).

[4]  Miryung Kim,et al.  Is neuron coverage a meaningful measure for testing deep neural networks? , 2020, ESEC/SIGSOFT FSE.

[5]  Paolo Tonella,et al.  An Empirical Evaluation of Mutation Operators for Deep Learning Systems , 2020, 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST).

[6]  U. Rajendra Acharya,et al.  Automated detection of COVID-19 cases using deep neural networks with X-ray images , 2020, Computers in Biology and Medicine.

[7]  Lei Ma,et al.  DeepMutation++: A Mutation Testing Framework for Deep Learning Systems , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Tiberiu T. Cocias,et al.  A survey of deep learning techniques for autonomous driving , 2019, J. Field Robotics.

[9]  Jing Yu,et al.  Test4Deep: an Effective White-Box Testing for Deep Neural Networks , 2019, 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[10]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[11]  Mark Harman,et al.  Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.

[12]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[13]  Yang Feng,et al.  DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks , 2019, ISSTA.

[14]  Jingyi Wang,et al.  Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[15]  Venceslav Kafedziski,et al.  Detection and Classification of Land Mines from Ground Penetrating Radar Data Using Faster R-CNN , 2018, 2018 26th Telecommunications Forum (TELFOR).

[16]  Jun Wan,et al.  MuNN: Mutation Analysis of Neural Networks , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[17]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[18]  Lei Ma,et al.  DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[19]  Daniel Kroening,et al.  Testing Deep Neural Networks , 2018, ArXiv.

[20]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[21]  Tareq Abed Mohammed,et al.  Understanding of a convolutional neural network , 2017, 2017 International Conference on Engineering and Technology (ICET).

[22]  Mathieu Ravaut,et al.  Deep learning applied to underwater mine warfare , 2017, OCEANS 2017 - Aberdeen.

[23]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[24]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[25]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[26]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.

[27]  Max Jaderberg,et al.  Spatial Transformer Networks , 2015, NIPS.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[30]  M. Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[31]  A. Jefferson Offutt,et al.  MuJava: a mutation system for java , 2006, ICSE.

[32]  A. Jefferson Offutt,et al.  Mutation 2000: uniting the orthogonal , 2001 .

[33]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[34]  Yves Le Traon,et al.  Chapter Six - Mutation Testing Advances: An Analysis and Survey , 2019, Adv. Comput..

[35]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[36]  Wynne Hsu,et al.  DESIGN OF MUTANT OPERATORS FOR THE C PROGRAMMING LANGUAGE , 2006 .

[37]  J. Zobel,et al.  Mutation Testing for the New Century , 2001, The Springer International Series on Advances in Database Systems.

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[40]  A. Krizhevsky ImageNet Classification with Deep Convolutional Neural Networks , 2022 .