Improving Relevancy Filter Methods for Cross-Project Defect Prediction

Context: Cross-project defect prediction (CPDP)research has been popular. One of the techniques for CPDP isa relevancy filter which utilizes clustering algorithms to selecta useful subset of the cross-project data. Their performanceheavily relies on the quality of clustering, and using an advancedclustering algorithm instead of simple ones used in the past studiescan contribute to the performance improvement. Objective:To propose and examine a new relevancy filter method usingan advanced clustering method DBSCAN (Density-Based SpatialClustering). Method: We conducted an experiment that examinedthe predictive performance of the proposed method. Theexperiments compared three relevancy filter methods, namely,Burak-filter, Peters-filter, and the proposed method with 56project data and four prediction models. Results: The predictiveperformance measures supported the proposed method. It wasbetter than Burak-filter and Peters-filter in terms of AUC andg-measure. Conclusion: The proposed method achieved betterprediction than the conventional methods. The results suggestedthat exploring advanced clustering algorithms could contributeto cross-project defect prediction.