How simple is software defect detection

Software defect detectors input structural metrics of code and output a prediction of how faulty a code module might be. Previous studies have shown that such metrics many be confused by the high correlation between metrics. To resolve this, feature subset selection (FSS) techniques such as principal components analysiscan be used to reduce the dimen- sionality of metric sets in hopes of creating smaller and more accurate detectors. This study benchmarks several FSS techniques and reports several studies where a large set metrics were reduced to a handful with little loss of detection accuracy. This result raises the possibility that software defect detection may be much simpler than previously believed.

[1]  John C. Munson,et al.  Measuring software evolution , 1996, Proceedings of the 3rd International Software Metrics Symposium.

[2]  Tim Menzies,et al.  Feature Subset Selection with TAR2less , 2003 .

[3]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[4]  Taghi M. Khoshgoftaar,et al.  An application of zero-inflated Poisson regression for software fault prediction , 2001, Proceedings 12th International Symposium on Software Reliability Engineering.

[5]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[6]  Tim Menzies,et al.  Reusing Models For Requirements Engineering , 2001 .

[7]  John C. Munson,et al.  Toward a quantifiable definition of software faults , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[8]  T. Menzies,et al.  Metrics that matter , 2002, 27th Annual NASA Goddard/IEEE Software Engineering Workshop, 2002. Proceedings..

[9]  Glenn Reeves,et al.  Software architecture themes in JPL's mission data system , 1999 .

[10]  Ying Hu,et al.  Just Enough Learning ( of Association Rules ) , 2022 .

[11]  R. Cranley,et al.  Multivariate Analysis—Methods and Applications , 1985 .

[12]  Swapna S. Gokhale,et al.  Regression Tree Modeling For The Prediction Of Software Quality , 1997 .

[13]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  Norman F. Schneidewind,et al.  Investigation of logistic regression as a discriminant of software quality , 2001, Proceedings Seventh International Software Metrics Symposium.

[16]  Taghi M. Khoshgoftaar,et al.  The use of software complexity metrics in software reliability modeling , 1991, Proceedings. 1991 International Symposium on Software Reliability Engineering.

[17]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[18]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Taghi M. Khoshgoftaar,et al.  Regression modelling of software quality: empirical investigation☆ , 1990 .

[21]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[22]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[23]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[24]  John C. Munson,et al.  Estimating Rates of Fault Insertion and Test Effectiveness in Software Systems , 1998 .

[25]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[27]  Tim Menzies,et al.  Model-based tests of truisms , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.

[28]  Norman F. Schneidewind Software metrics model for integrating quality control and prediction , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[29]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[30]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[31]  Tim Menzies,et al.  Condensing Uncertainty via Incremental Treatment Learning , 2003 .

[32]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[33]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[34]  Taghi M. Khoshgoftaar,et al.  MODELING SOFTWARE QUALITY WITH CLASSIFICATION TREES , 2001 .

[35]  Tim Menzies,et al.  When can we test less? , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[36]  Marvin V. Zelkowitz,et al.  Complexity Measure Evaluation and Selection , 1995, IEEE Trans. Software Eng..

[37]  John C. Munson,et al.  Finding Fault with Faults: A Case Study , 1997 .

[38]  Glenn Reeves,et al.  Software architecture themes in JPL's Mission Data System , 1999, 2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484).