Manifold Learning for Cross-project Software Defect Prediction

Traditional software defect prediction studies usually built models using within-project data. However, there are not enough local data repositories for us to build the software defect prediction model in practice. Recently, cross-project software defect prediction (CSDP) has been proposed. Due to distributions of source and target domains are different, the existing CSDP models do not investigate how to use mixed project data to predict target. In this paper, we propose a method termed geodesic flow kernel software defect prediction (GFKSDP) to solve the problem of different distributions between source domain and target domain. Our GFKSDP method shrinks the differences of source and target domains by integrating an infinite number of subspaces that characterize the changes of geometric and statistical properties from source domain to target domain. Our method can adaptively determine significant parameters to reduce computational complexity. Besides, this method does better than traditional studies in unsupervised learning. Experimental results in AEEEM and Relink datasets show that the proposed method can effectively improve the performance of cross-project software defect prediction. And the proposed method outperforms state-of-the-art methods in unsupervised learning.