Software defect association mining and defect correction effort prediction

Much current software defect prediction work focuses on the number of defects remaining in a software system. In this paper, we present association rule mining based methods to predict defect associations and defect correction effort. This is to help developers detect software defects and assist project managers in allocating testing resources more effectively. We applied the proposed methods to the SEL defect data consisting of more than 200 projects over more than 15 years. The results show that, for defect association prediction, the accuracy is very high and the false-negative rate is very low. Likewise, for the defect correction effort prediction, the accuracy for both defect isolation effort prediction and defect correction effort prediction are also high. We compared the defect correction effort prediction method with other types of methods - PART, C4.5, and Naive Bayes - and show that accuracy has been improved by at least 23 percent. We also evaluated the impact of support and confidence levels on prediction accuracy, false-negative rate, false-positive rate, and the number of rules. We found that higher support and confidence levels may not result in higher prediction accuracy, and a sufficient number of rules is a precondition for high prediction accuracy.

[1]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[2]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[3]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[4]  Thomas Ragg,et al.  Using machine learning for estimating the defect content after an inspection , 2004, IEEE Transactions on Software Engineering.

[5]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[6]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[7]  Inderpal S. Bhandari,et al.  In-Process Improvement through Defect Data Interpretation , 1994, IBM Syst. J..

[8]  Ke Wang,et al.  Building Hierarchical Classifiers Using Class Proximity , 1999, VLDB.

[9]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[10]  Carol Withrow,et al.  Prediction and control of ADA software defects , 1990, J. Syst. Softw..

[11]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[12]  Khaled El Emam,et al.  Evaluating Capture-Recapture Models with Two Inspectors , 2001, IEEE Trans. Software Eng..

[13]  Annie T. T. Ying,et al.  Predicting source code changes by mining revision history , 2003 .

[14]  Lawrence G. Votta,et al.  Assessing Software Designs Using Capture-Recapture Methods , 1993, IEEE Trans. Software Eng..

[15]  Inderpal Bhandari,et al.  Attribute focusing: machine-assisted knowledge discovery applied to software production process control , 1993 .

[16]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[17]  G. Q. Kenny Estimating defects in commercial software during operational use , 1993 .

[18]  Ke Wang,et al.  Frequent-subsequence-based prediction of outer membrane proteins , 2003, KDD '03.

[19]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[20]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[21]  Jon Valett,et al.  Data collection procedures for the Software Engineering Laboratory (SEL) database , 1992 .

[22]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[23]  Inderpal S. Bhandari,et al.  A Case Study of Software Process Improvement During Development , 1993, IEEE Trans. Software Eng..

[24]  Claes Wohlin,et al.  Defect content estimations from review data , 1998, Proceedings of the 20th International Conference on Software Engineering.

[25]  Taghi M. Khoshgoftaar,et al.  Regression modelling of software quality: empirical investigation☆ , 1990 .

[26]  Lionel C. Briand,et al.  A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content , 2000, IEEE Trans. Software Eng..

[27]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[28]  Claes Wohlin,et al.  An Experimental Evaluation of an Experience-Based Capture-Recapture Method in Software Code Inspections , 1998, Empirical Software Engineering.

[29]  Ke Wang,et al.  Growing decision trees on support-less association rules , 2000, KDD '00.

[30]  Leonard E. Trigg,et al.  Technical Note: Naive Bayes for Regression , 2000, Machine Learning.

[31]  Nader B. Ebrahimi,et al.  On the Statistical Analysis of the Number of Errors Remaining in a Software Design Document after Inspection , 1997, IEEE Trans. Software Eng..