Manifesting Bugs in Machine Learning Code: An Explorative Study with Mutation Testing

Nowadays statistical machine learning is widely adopted in various domains such as data mining, image recognition and automated driving. However, software quality assurance for machine learning is still in its infancy. While recent efforts have been put into improving the quality of training data and trained models, this paper focuses on code-level bugs in the implementations of machine learning algorithms. In this explorative study we simulated program bugs by mutating Weka implementations of several classification algorithms. We observed that 8%-40% of the logically non-equivalent executable mutants were statistically indistinguishable from their golden versions. Moreover, other 15%-36% of the mutants were stubborn, as they performed not significantly worse than a reference classifier on at least one natural data set. We also experimented with several approaches to killing those stubborn mutants. Preliminary results indicate that bugs in machine learning code may have negative impacts on statistical properties such as robustness and learning curves, but they could be very difficult to detect, due to the lack of effective oracles.

[1]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[2]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[3]  Johannes Mayer,et al.  On Random Testing of Image Processing Applications , 2006, 2006 Sixth International Conference on Quality Software (QSIC'06).

[4]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[5]  Ohad Shamir,et al.  Learnability and Stability in the General Learning Setting , 2009, COLT.

[6]  Roger B. Grosse,et al.  Testing MCMC code , 2014, ArXiv.

[7]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[8]  A. Bachelor GLOSSARY OF TERMS GLOSSARY OF TERMS , 2010 .

[9]  Huai Liu,et al.  Metamorphic Testing for Web Services: Framework and a Case Study , 2011, 2011 IEEE International Conference on Web Services.

[10]  Gail E. Kaiser,et al.  Properties of Machine Learning Applications for Use in Metamorphic Testing , 2008, SEKE.

[11]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[12]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Stephen S. Yau,et al.  Testing context-sensitive middleware-based software applications , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[15]  Claudia Perlich,et al.  Learning Curves in Machine Learning , 2010, Encyclopedia of Machine Learning.

[16]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[17]  Sriram K. Rajamani,et al.  Debugging Machine Learning Tasks , 2016, ArXiv.

[18]  Sergio Segura,et al.  A Survey on Metamorphic Testing , 2016, IEEE Transactions on Software Engineering.

[19]  Junfeng Yang,et al.  DeepXplore , 2019, Commun. ACM.

[20]  Dongmei Zhang,et al.  An application of metamorphic testing for testing scientific software , 2016, MET@ICSE.

[21]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[22]  A. Jefferson Offutt,et al.  Mutation 2000: uniting the orthogonal , 2001 .

[23]  Lawrence D. Jackel,et al.  Learning Curves: Asymptotic Values and Rate of Convergence , 1993, NIPS.

[24]  Baowen Xu,et al.  Testing and validating machine learning classifiers by metamorphic testing , 2011, J. Syst. Softw..

[25]  William Pao,et al.  Metamorphic Testing Using Geometric Interrogation Technique and Its Application , 1970 .

[26]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[27]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[28]  Judea Pearl,et al.  Probabilities Of Causation: Three Counterfactual Interpretations And Their Identification , 1999, Synthese.