Test4Deep: an Effective White-Box Testing for Deep Neural Networks

Current testing for Deep Neural Networks (DNNs) focuses on quantity of test cases but ignores diversity. To the best of our knowledge, DeepXplore is the first white-box framework for Deep Learning testing by triggering differential behaviors between multiple DNNs and increasing neuron coverage to improve diversity. Since it is based on multiple DNNs facing problems that (1) the framework is not friendly to a single DNN, (2) if incorrect predictions made by all DNNs simultaneously, DeepXplore cannot generate test cases. This paper presents Test4Deep, a white-box testing framework based on a single DNN. Test4Deep avoids mistakes of multiple DNNs by inducing inconsistencies between predicted labels of original inputs and that of generated test inputs. Meanwhile, Test4Deep improves neuron coverage to capture more diversity by attempting to activate more inactivated neurons. The proposed method was evaluated on three popular datasets with nine DNNs. Compared to DeepXplore, Test4Deep produced average 4.59% (maximum 10.49%) more test cases that all found errors and faults of DNNs. These test cases got 19.57% more diversity increment and 25.88% increment of neuron coverage. Test4Deep can further be used to improve the accuracy of DNNs by average up to 5.72% (maximum 7.0%).

[1]  Sarfraz Khurshid,et al.  DeepRoad: GAN-based Metamorphic Autonomous Driving System Testing , 2018, ArXiv.

[2]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[3]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[6]  Hoi-Jun Yoo,et al.  14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Lei Ma,et al.  DeepGauge: Comprehensive and Multi-Granularity Testing Criteria for Gauging the Robustness of Deep Learning Systems , 2018, ArXiv.

[12]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[13]  Junfeng Yang,et al.  DeepXplore , 2019, Commun. ACM.

[14]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[15]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[16]  David Zhang,et al.  FSIM: A Feature Similarity Index for Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[17]  Daniel Kroening,et al.  Testing Deep Neural Networks , 2018, ArXiv.

[18]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  David Zhang,et al.  A comprehensive evaluation of full reference image quality assessment algorithms , 2012, 2012 19th IEEE International Conference on Image Processing.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.