Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics

The automated detection of defective modules within software systems could lead to reduced development costs and more reliable software. In this work the static code metrics for a collection of modules contained within eleven NASA data sets are used with a Support Vector Machine classifier. A rigorous sequence of pre-processing steps were applied to the data prior to classification, including the balancing of both classes (defective or otherwise) and the removal of a large number of repeating instances. The Support Vector Machine in this experiment yields an average accuracy of 70% on previously unseen data.

[1]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[2]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[3]  Martin Shepperd,et al.  Data Sets and Data Quality in Software Engineering: Eight Years On , 2016, PROMISE.

[4]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[5]  Douglas H. Fisher,et al.  Ordering Effects in Incremental Learning , 1993 .

[6]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[7]  Neil Davey,et al.  Using sampling methods to improve binding site predictions , 2006, ESANN.

[8]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[9]  M. Shepperd,et al.  A critique of cyclomatic complexity as a software metric , 1988, Softw. Eng. J..

[10]  Ian Sommerville,et al.  Software Engineering: (Update) (8th Edition) (International Computer Science) , 2006 .

[11]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[12]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[13]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[14]  Zhan Li,et al.  A practical method for the software fault-prediction , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[15]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[16]  G. D. Frewin,et al.  M.H. Halstead's Software Science - a critical examination , 1982, ICSE '82.

[17]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[18]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Stefano Ferilli,et al.  Avoiding Order Effects in Incremental Learning , 2005, AI*IA.

[21]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[22]  H. E. Dunsmore,et al.  Software Science Revisited: A Critical Analysis of the Theory and Its Empirical Support , 1983, IEEE Transactions on Software Engineering.