ARTDL: Adaptive Random Testing for Deep Learning Systems

With recent breakthroughs in Deep Learning (DL), DL systems are increasingly deployed in safety-critical fields. Hence, some software testing methods are required to ensure the reliability and safety of DL systems. Since the rules of DL systems are inferred from training data, it is difficult to know the implementation rules about each behavior of DL systems. At the same time, Random Testing (RT) is a popular testing method and the knowledge about software implementation is not needed when we use RT. Therefore, RT is very suitable for the testing of DL systems. And the existing mechanisms for testing DL systems also depend heavily on RT by the labeled test data. In order to increase the effectiveness of RT for DL systems, we design, implement and evaluate the Adaptive Random Testing for DL systems (ARTDL), which is the first Adaptive Random Testing (ART) method to improve the effectiveness of RT for DL systems. ARTDL refers to the idea of ART. That is, fewer test cases are needed to detect failures by selecting the test case with the furthest distance from non-failure-causing test cases. Firstly, we propose the Feature-based Euclidean Distance (FED) as the distance metric that can be used to measure the difference between failure-causing inputs and non-failure-causing inputs. Secondly, we verify the availability of FED by presenting the failure pattern of DL models. Finally, we design ARTDL algorithm to generate the test cases that are more likely to cause failures based on the FED. We implement ARTDL to test top performing DL models in the field of image classification and automatic driving. The results show that, on average, the number of test cases used to find the first bug is reduced by 62.74% through ARTDL, compared with RT.

[1]  Tsong Yueh Chen,et al.  Proportional sampling strategy: guidelines for software testing practitioners , 1996, Inf. Softw. Technol..

[2]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Bertrand Meyer,et al.  ARTOO , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[4]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[5]  Tsong Yueh Chen,et al.  Metamorphic Testing: A New Approach for Generating Next Test Cases , 2020, ArXiv.

[6]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[7]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[8]  Daniel Kroening,et al.  Testing Deep Neural Networks , 2018, ArXiv.

[9]  Simeon C. Ntafos,et al.  An Evaluation of Random Testing , 1984, IEEE Transactions on Software Engineering.

[10]  Tsong Yueh Chen,et al.  Adaptive Random Testing: The ART of test case diversity , 2010, J. Syst. Softw..

[11]  Huai Liu,et al.  Metamorphic Testing , 2018, ACM Comput. Surv..

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[14]  Huai Liu,et al.  A Cost-Effective Random Testing Method for Programs with Non-Numeric Inputs , 2016, IEEE Transactions on Computers.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Robert G. Merkel,et al.  Analysis and enhancements of adaptive random testing , 2005 .

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[20]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[21]  Junfeng Yang,et al.  DeepXplore , 2019, Commun. ACM.

[22]  Foutse Khomh,et al.  On Testing Machine Learning Programs , 2018, J. Syst. Softw..

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Liqun Sun,et al.  Metamorphic testing of driverless cars , 2019, Commun. ACM.

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[26]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[27]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[28]  Yue Zhao,et al.  DLFuzz: differential fuzzing testing of deep learning systems , 2018, ESEC/SIGSOFT FSE.

[29]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[30]  Xiaoxing Ma,et al.  Structural Coverage Criteria for Neural Networks Could Be Misleading , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER).

[31]  Mykel J. Kochenderfer,et al.  Policy compression for aircraft collision avoidance systems , 2016, 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC).

[32]  Tsong Yueh Chen,et al.  Fault-based testing without the need of oracles , 2003, Inf. Softw. Technol..

[33]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[34]  Lee J. White,et al.  A Domain Strategy for Computer Program Testing , 1980, IEEE Transactions on Software Engineering.

[35]  Muhammad Zohaib Z. Iqbal,et al.  A Systematic Mapping Study on Testing of Machine Learning Programs , 2019, ArXiv.

[36]  Mark Harman,et al.  Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.

[37]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38]  I. K. Mak,et al.  Adaptive Random Testing , 2004, ASIAN.