Testing and validating machine learning classifiers by metamorphic testing

Machine Learning algorithms have provided core functionality to many application domains - such as bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications because often there is no "test oracle" to verify the correctness of the computed outputs. To help address the software quality, in this paper we present a technique for testing the implementations of machine learning classification algorithms which support such applications. Our approach is based on the technique "metamorphic testing", which has been shown to be effective to alleviate the oracle problem. Also presented include a case study on a real-world machine learning application framework, and a discussion of how programmers implementing machine learning algorithms can avoid the common pitfalls discovered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not sufficiently effective to detect faults in a supervised classification program. The effectiveness of metamorphic testing is further confirmed by the detection of real faults in a popular open-source classification program.

[1]  Huai Liu,et al.  An innovative approach for testing bioinformatics programs using metamorphic testing , 2009, BMC Bioinformatics.

[2]  Richard P. Lippmann,et al.  An Overview of Issues in Testing Intrusion Detection Systems , 2003 .

[3]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[4]  Christopher Krügel,et al.  Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[5]  Stephen S. Yau,et al.  Testing context-sensitive middleware-based software applications , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[6]  Yoav Freund,et al.  Motif Discovery Through Predictive Modeling of Gene Regulation , 2005, RECOMB.

[7]  Johannes Mayer,et al.  Statistical Metamorphic Testing Testing Programs with Random Output by Means of Statistical Hypothesis Tests and Metamorphic Testing , 2007, Seventh International Conference on Quality Software (QSIC 2007).

[8]  C. D. dos Remedios,et al.  Customising an antibody leukocyte capture microarray for systemic lupus erythematosus: Beyond biomarker discovery , 2010, Proteomics. Clinical applications.

[9]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[10]  Tsong Yueh Chen,et al.  Fault-based testing without the need of oracles , 2003, Inf. Softw. Technol..

[11]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[12]  Blaz Zupan,et al.  Orange: From Experimental Machine Learning to Interactive Data Mining , 2004, PKDD.

[13]  Ralf Zimmer,et al.  BioWeka - extending the Weka framework for bioinformatics , 2007, Bioinform..

[14]  Elaine J. Weyuker,et al.  Pseudo-oracles for non-testable programs , 1981, ACM '81.

[15]  Elaine J. Weyuker,et al.  On Testing Non-Testable Programs , 1982, Comput. J..

[16]  Shin Yoo Metamorphic Testing of Stochastic Optimisation , 2010, 2010 Third International Conference on Software Testing, Verification, and Validation Workshops.

[17]  Gregg Rothermel,et al.  An experimental determination of sufficient mutant operators , 1996, TSEM.

[18]  Gail E. Kaiser,et al.  Using JML Runtime Assertion Checking to Automate Metamorphic Testing in Applications without Test Oracles , 2009, 2009 International Conference on Software Testing Verification and Validation.

[19]  Tsong Yueh Chen,et al.  Case studies on the selection of useful relations in metamorphic testing , 2004 .

[20]  Sergio Segura,et al.  Automated Test Data Generation on the Analyses of Feature Models: A Metamorphic Testing Approach , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[21]  Arnaud Gotlieb,et al.  Automated metamorphic testing , 2003, Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003.

[22]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[23]  Bharat B. Madan,et al.  A method for modeling and quantifying the security attributes of intrusion tolerant systems , 2004, Perform. Evaluation.

[24]  Biswanath Mukherjee,et al.  A Methodology for Testing Intrusion Detection Systems , 1996, IEEE Trans. Software Eng..

[25]  Jungsoon P. Yoo,et al.  Software testing: a machine learning experiment , 1995, CSC '95.

[26]  W. Chan,et al.  A Metamorphic Testing Approach for Online Testing of Service-Oriented Software Applications , 2007, Int. J. Web Serv. Res..

[27]  Lionel C. Briand Novel Applications of Machine Learning in Software Testing , 2008, 2008 The Eighth International Conference on Quality Software.

[28]  Tsong Yueh Chen,et al.  Semi-Proving: An Integrated Method for Program Proving, Testing, and Debugging , 2011, IEEE Transactions on Software Engineering.

[29]  Jeffrey J. P. Tsai,et al.  Machine Learning and Software Engineering , 2004, Software Quality Journal.

[30]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[31]  Baowen Xu,et al.  Application of Metamorphic Testing to Supervised Classifiers , 2009, 2009 Ninth International Conference on Quality Software.

[32]  A. Jefferson Offutt,et al.  MuJava: an automated class mutation system , 2005, Softw. Test. Verification Reliab..

[33]  Simeon C. Ntafos,et al.  An Evaluation of Random Testing , 1984, IEEE Transactions on Software Engineering.

[34]  Michael A. Charleston,et al.  Differential variability analysis of gene expression and its application to human diseases , 2008, ISMB.

[35]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[36]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[37]  Baowen Xu,et al.  Spectrum-based fault localization without test oracles , 2010 .

[38]  Irwin E. Perlin,et al.  Proceedings of the ACM annual conference , 1973 .

[39]  Mario Piattini,et al.  Generating three-tier applications from relational databases: a formal and practical approach , 2002, Inf. Softw. Technol..

[40]  Gail E. Kaiser,et al.  Empirical Evaluation of Approaches to Testing Applications without Test Oracles , 2010 .

[41]  Gail E. Kaiser,et al.  Properties of Machine Learning Applications for Use in Metamorphic Testing , 2008, SEKE.