Mining software defect data to support software testing management

Achieving high quality software would be easier if effective software development practices were known and deployed in appropriate contexts. Because our theoretical knowledge of the underlying principles of software development is far from complete, empirical analysis of past experience in software projects is essential for acquiring useful software practices. As advances in software technology continue to facilitate automated tracking and data collection, more software data become available. Our research aims to develop methods to exploit such data for improving software development practices.This paper proposes an empirical approach, based on the analysis of defect data, that provides support for software testing management in two ways: (1) construction of a predictive model for defect repair times, and (2) a method for assessing testing quality across multiple releases. The approach employs data mining techniques including statistical methods and machine learning. To illustrate the proposed approach, we present a case study using the defect reports created during the development of three releases of a large medical software system, produced by a large well-established software company. We validate our proposed testing quality assessment using a statistical test at a significance level of 0.1. Despite the limitations of the available data, our predictive models give accuracies as high as 93%.

[1]  Steve McConnell,et al.  Best Practices: Gauging Software Readiness with Defect Tracking , 1997, IEEE Softw..

[2]  Robert Culbertson,et al.  Rapid Testing , 2002 .

[3]  John D. Musa,et al.  Software reliability - measurement, prediction, application , 1987, McGraw-Hill series in software engineering and technology.

[4]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Catherine Stringfellow,et al.  Deriving a Fault Architecture to Guide Testing , 2004, Software Quality Journal.

[7]  Norman F. Schneidewind,et al.  Modelling the fault correction process , 2001, Proceedings 12th International Symposium on Software Reliability Engineering.

[8]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[9]  John H R May,et al.  Procs. 17th IEEE International Symposium on Software Reliability Engineering (ISSRE 06), Raleigh, North Carolina 7-10 November 2006 , 2006 .

[10]  Barry Boehm,et al.  Top 10 list [software development] , 2001 .

[11]  Lawrence L. Lapin,et al.  Statistics for Modern Business Decisions. , 1978 .

[12]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[13]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[14]  Rattikorn Hewett,et al.  Alternative Approach to Utilize Software Defect Reports , 2006, SEDE.

[15]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[16]  Catherine Stringfellow,et al.  An Empirical Method for Selecting Software Reliability Growth Models , 2002, Empirical Software Engineering.

[17]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  Ian Witten,et al.  Data Mining , 2000 .

[20]  Taghi M. Khoshgoftaar,et al.  An empirical study of program quality during testing and maintenance , 2004, Software Quality Journal.

[21]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[22]  Stephen R. Schach,et al.  Testing: principles and practice , 1996, CSUR.

[23]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[24]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[25]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[26]  Barry W. Boehm,et al.  Software Defect Reduction Top 10 List , 2001, Computer.

[27]  HewettRattikorn Mining software defect data to support software testing management , 2011 .

[28]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[29]  Robert Galen Software Endgames: Eliminating Defects, Controlling Change, and the Countdown To On-Time Delivery , 2004 .

[30]  Catherine Stringfellow,et al.  Quantitative Analysis of Development Defects to Guide Testing: A Case Study , 2001, Software Quality Journal.

[31]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[32]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[33]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[34]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach (McGraw-Hill Series in Computer Science) , 2004 .

[35]  Diane Manlove,et al.  In-process metrics for software testing , 2001, IBM Syst. J..

[36]  Catherine Stringfellow,et al.  Software Defect Data and Predictability for Testing Schedules , 2006, SEKE.

[37]  Padmanabhan Santhanam,et al.  Exploring defect data from development and customer usage on software modules over multiple releases , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[38]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.