Investigating the Effectiveness of Mutation Testing Tools in the Context of Deep Neural Networks

Verifying the correctness of the implementation of machine learning algorithms like neural networks has become a major topic because – for example – its increasing use in the context of safety critical systems like automated or autonomous vehicles. In contrast to evaluating the learning capabilities of such machine learning algorithms, in verification, and particularly in testing we are interested in finding critical scenarios and in giving some sort of guarantees with respect to the underlying used tests. In this paper, we contribute to the area of testing machine learning algorithms and investigate the effectiveness of traditional mutation tools in the context of Deep Neural Networks testing. In particular, we try to answer the question whether mutated neural networks can be identified considering their learning capabilities when compared to the original network. To answer this question, we performed an empirical study using Java code implementations of such networks and a mutation tool to create mutated neural networks models. As an outcome, we are able to identify some mutations to be more likely to be detected than others.

[1]  Richard Torkar,et al.  Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation , 2014, IEEE Transactions on Software Engineering.

[2]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[3]  Daniel Kroening,et al.  Concolic Testing for Deep Neural Networks , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Anthony Ventresque,et al.  Demo: PIT a Practical Mutation Testing Tool for Java , 2016 .

[5]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[6]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[7]  A. Jefferson Offutt,et al.  MuJava: a mutation system for java , 2006, ICSE.

[8]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[9]  Fernando A. Mujica,et al.  An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[10]  Lei Ma,et al.  DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[11]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Yann LeCun,et al.  Deep belief net learning in a long-range vision system for autonomous off-road driving , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[14]  A. Jefferson Offutt,et al.  Investigations of the software testing coupling effect , 1992, TSEM.

[15]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).