AODE for Source Code Metrics for Improved Software Maintainability

Software metrics are collected at various phases of the whole software development process, in order to assist in monitoring and controlling the software quality. However, software quality control is complicated, because of the complex relationship between these metrics and the attributes of a software development process. To solve this problem, many excellent techniques have been introduced into software maintainability domain. In this paper, we propose a novel classification method--Aggregating One-Dependence Estimators (AODE) to support and enhance our understanding of software metrics and their relationship to software quality. Experiments show that performance of AODE is much better than eight traditional classification methods and it is a promising method for software quality prediction. Furthermore, we present a Symmetrical Uncertainty (SU) based feature selection method to reduce source code metrics taking part in classification, make these classifiers more efficient and keep their performances not undermined meanwhile. Our empirical study shows the promising capability of SU for selecting relevant metrics and preserving original performances of the classifiers.

[1]  Witold Pedrycz,et al.  Association Analysis of Software Measures , 2002, Int. J. Softw. Eng. Knowl. Eng..

[2]  Cemal Yilmaz,et al.  Software Metrics , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[3]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[4]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[5]  Marcey L. Abate,et al.  Measuring the Software Process , 2001, Technometrics.

[6]  Witold Pedrycz,et al.  Using self-organizing maps to analyze object-oriented software measures , 2001, J. Syst. Softw..

[7]  Witold Pedrycz,et al.  Genetic granular classifiers in modeling software quality , 2005, J. Syst. Softw..

[8]  Chein-I Chang,et al.  Robust radial basis function neural networks , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[16]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[17]  G. Izmirlian,et al.  Application of the Random Forest Classification Algorithm to a SELDI‐TOF Proteomics Study in the Setting of a Cancer Prevention Trial , 2004, Annals of the New York Academy of Sciences.

[18]  John C. Munson,et al.  Software metrics in reliability assessment , 1996 .

[19]  Michael R. Lyu,et al.  A novel method for early software quality prediction based on support vector machine , 2005, 16th IEEE International Symposium on Software Reliability Engineering (ISSRE'05).

[20]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[21]  Abraham Kandel,et al.  Data mining in software metrics databases , 2004, Fuzzy Sets Syst..

[22]  Yi Liu,et al.  Locality Preserving Projection on Source Code Metrics for Improved Software Maintainability , 2006, Australian Conference on Artificial Intelligence.

[23]  K. Vairavan,et al.  An Experimental Investigation of Software Metrics and Their Relationship to Software Development Effort , 1989, IEEE Trans. Software Eng..