An Empirical Study on Spectral Clustering-based Software Defect Detection

Software defect detection is essential in software development. Most existing approaches often apply Supervised Machine Learning (SML) techniques for software defect detection. However, SML techniques need to a large number of manual labelling for model training, which is time-consuming and laborious. An alternative solution is to apply UnSupervised Machine Learning (USML) in software defect detection. USML techniques, as an approach without requiring labeled datasets, have been applied for software defect detection. Spectral clustering, as one of approaches in USML, shows the potential performance in software defect detection. The core of spectral clustering is the similarity algorithms, which calculate the similarity between metric values of software entities to detect software defects. Yet, the current studies on spectral clustering-based software defect detection models rarely consider the impact of different similarity algorithms on defect detection results.To address this problem, we construct an empirical study to investigate the impact of similarity algorithms in the spectral clustering-based software defect detection models. We compare the differences of three similarity algorithms, which contains k-nearest neighbours, fully connected, and vector dot product. We conduct experiments on the two real-world data sets of AEEEM and PROMISE, and the experimental results show the fully connected algorithm has better performance than other algorithms in the spectral clustering-based software defect detection.