Multiple-Implementation Testing of Supervised Learning Software

Machine learning (ML) software, used to implement an ML algorithm, is widely used in many application domains such as financial, business, and engineering domains. Faults in ML software can cause substantial losses in these application domains. Thus, it is very critical to conduct effective testing of ML software to detect and eliminate its faults. However, testing ML software is difficult, especially on producing test oracles used for checking behavior correctness (such as using expected properties or expected test outputs). To tackle the test-oracle issue, in this paper, we present a novel black-box approach of multiple-implementation testing for supervised learning software. The insight underlying our approach is that there can be multiple implementations (independently written) for a supervised learning algorithm, and majority of them may produce the expected output for a test input (even if none of these implementations are fault-free). In particular, our approach derives a pseudo-oracle for a test input by running the test input on n implementations of the supervised learning algorithm, and then using the common test output produced by a majority (determined by a percentage threshold) of these n implementations. Our approach includes techniques to address challenges in multiple-implementation testing (or generally testing) of supervised learning software: definition of a test case in testing supervised learning software, along with resolution of inconsistent algorithm configurations across implementations. The evaluations on our approach show that our multiple-implementation testing is effective in detecting real faults in real-world ML software (even popularly used ones), including 5 faults from 10 NaiveBayes implementations and 4 faults from 20 k-nearest neighbor implementations.

[1]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[2]  Tao Xie,et al.  Multiple-implementation testing for XACML implementations , 2008, TAV-WEB '08.

[3]  Elisabetta Di Nitto,et al.  Proceedings of the IEEE/ACM international conference on Automated software engineering , 2010, ASE 2010.

[4]  David Lo,et al.  An Empirical Study of Bugs in Machine Learning Systems , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[5]  Alex Groce,et al.  You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems , 2014, IEEE Transactions on Software Engineering.

[6]  Nikolai Tillmann,et al.  MiTV: multiple-implementation testing of user-input validators for web applications , 2010, ASE '10.

[7]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[8]  Christian Murphy,et al.  Parameterizing random test data according to equivalence classes , 2007, RT '07.

[9]  Md. Shaik Sadi Testing and fault localization of phylogenetic inference programs using metamorphic technique , 2013 .

[10]  L. L. Pullum,et al.  Early Results from Metamorphic Testing of Epidemiological Models , 2012, 2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom).

[11]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[12]  Baowen Xu,et al.  Application of Metamorphic Testing to Supervised Classifiers , 2009, 2009 Ninth International Conference on Quality Software.

[13]  M. Young Test Oracles , 2001 .

[14]  Tao Xie,et al.  Towards a Framework for Differential Unit Testing of Object-Oriented Programs , 2007, Second International Workshop on Automation of Software Test (AST '07).

[15]  Alessandro Orso,et al.  WEBDIFF: Automated identification of cross-browser issues in web applications , 2010, 2010 IEEE International Conference on Software Maintenance.

[16]  Gail E. Kaiser,et al.  Improving the Dependability of Machine Learning Applications , 2008 .

[17]  A. Jefferson Offutt,et al.  Book review: Introduction to Software Testing written by Paul Amman & Jeff Offutt. and published by CUP, 2008, 978-0-521-88038 322 pp., 0-471-20282-7 , 2008, SOEN.

[18]  Bill Hibbard,et al.  Exploratory engineering in artificial intelligence , 2014, Commun. ACM.