Analysis of Clustering Techniques for Software Quality Prediction

Clustering is the unsupervised classification of patterns into groups. A clustering algorithm partitions a data set into several groups such that similarity within a group is larger than among groups The clustering problem has been addressed in many contexts and by researchers in many disciplines, this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. There is need to develop some methods to build the software fault prediction model based on unsupervised learning which can help to predict the fault -- proneness of a program modules when fault labels for modules are not present. One of the such method is use of clustering techniques. This paper presents a case study of different clustering techniques and analyzes their performance.

[1]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[2]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[3]  Taghi M. Khoshgoftaar,et al.  Software quality estimation with limited fault data: a semi-supervised learning perspective , 2007, Software Quality Journal.

[4]  Taghi M. Khoshgoftaar,et al.  Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[5]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[6]  Nishu Sharma,et al.  A Comparative Study Of Data Clustering Techniques , 2013 .

[7]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[8]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[9]  Taghi M. Khoshgoftaar,et al.  Unsupervised learning for expert-based software quality estimation , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[10]  Banu Diri,et al.  Clustering and Metrics Thresholds Based Software Fault Prediction of Unlabeled Program Modules , 2009, 2009 Sixth International Conference on Information Technology: New Generations.