Software metrics data clustering for quality prediction

Software metrics are collected at various phases of the software development process. These metrics contain the information of software and can be used to predict software quality in the early stage of software life cycle. Intelligent computing techniques such as data mining can be applied in the study of software quality by analyzing software metrics. Clustering analysis, which is one of data mining techniques, is adopted to build the software quality prediction models in early period of software testing. In this paper, three clustering methods, k-means, fuzzy c-means and Gaussian mixture model, are investigated for the analysis of two real-world software metric datasets. The experiment results show that the best method in predicting software quality is dependent on practical dataset, and clustering analysis technique has advantages in software quality prediction since it can be used in the case having little prior knowledge.

[1]  Abraham Kandel,et al.  Data mining in software metrics databases , 2004, Fuzzy Sets Syst..

[2]  Michael R. Lyu,et al.  An empirical study on testing and fault tolerance for software reliability engineering , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[3]  Michael R. Lyu,et al.  Software quality prediction using mixture models with EM algorithm , 2000, Proceedings First Asia-Pacific Conference on Quality Software.

[4]  Norman E. Fenton,et al.  Software metrics: successes, failures and new directions , 1999, J. Syst. Softw..

[5]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[6]  C. L. Philip Chen,et al.  Cluster number selection for a small set of samples using the Bayesian Ying-Yang model , 2002, IEEE Trans. Neural Networks.

[7]  Abhijit S. Pandya,et al.  A comparative study of pattern recognition techniques for quality evaluation of telecommunications software , 1994, IEEE J. Sel. Areas Commun..

[8]  Michael R. Lyu,et al.  An empirical study on reliability modeling for diverse software systems , 2004, 15th International Symposium on Software Reliability Engineering.