CFPS: Collaborative filtering based source projects selection for cross-project defect prediction

Abstract Software defect prediction aims at helping developers allocate existing resources by predicting defect-prone modules prior to the testing phase. In the past decade, cross-project defect prediction (CPDP) have gained more attention than within-project defect prediction (WPDP) as WPDP is usually inefficient with the scarcity of training data due to the absence of historical defect data. Currently most CPDP studies focus on selecting appropriate training instances for improving the performance of defect prediction while few studies pay attention to the selection of appropriate source projects. However, in practice, source projects selection is the basis and prerequisite of training instances selection as an increasing number of open source software defect data are now available. In present study, we propose a Collaborative Filtering based source Projects Selection (CFPS) method for cross-project defect prediction. For a given new project, the similarity between it and each historical project is firstly calculated and thus the corresponding similarity repository could be obtained. Then CFPS mines the applicability among historical projects for constructing an applicability repository. Finally, with the aforementioned applicability and similarity repository, the popular user-based collaborative filtering algorithm is employed to recommend the appropriate source projects for the given new project. In the experiment, we have empirically validated the importance and necessity of selecting appropriate source projects. Furthermore, the experimental results also demonstrate that the proposed CFPS method is feasible and effective.

[1]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[2]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[3]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[4]  Yong Li,et al.  Evaluating Data Filter on Cross-Project Defect Prediction: Comparison and Improvements , 2017, IEEE Access.

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Akito Monden,et al.  MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction , 2018, IEEE Trans. Software Eng..

[7]  Ahmed E. Hassan,et al.  Towards improving statistical modeling of software engineering data: think locally, act globally! , 2015, Empirical Software Engineering.

[8]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[9]  Ajalmar R. da Rocha Neto,et al.  Classification with reject option for software defect prediction , 2016, Appl. Soft Comput..

[10]  Jens Grabowski,et al.  Global vs. local models for cross-project defect prediction , 2017, Empirical Software Engineering.

[11]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Ruchika Malhotra,et al.  An empirical framework for defect prediction using machine learning techniques with Android software , 2016, Appl. Soft Comput..

[14]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[15]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[16]  Taghi M. Khoshgoftaar,et al.  Software quality analysis by combining multiple projects and learners , 2008, Software Quality Journal.

[17]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[18]  Baowen Xu,et al.  An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems , 2017, IEEE Transactions on Software Engineering.

[19]  D. Spinellis,et al.  Chapter 1 Using Object-Oriented Design Metrics to Predict Software Defects , 2010 .

[20]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[21]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[22]  Steffen Herbold,et al.  A systematic mapping study on cross-project defect prediction , 2017, ArXiv.

[23]  Marian Jureczko,et al.  Using Object-Oriented Design Metrics to Predict Software Defects 1* , 2010 .

[24]  Ayse Basar Bener,et al.  Empirical evaluation of the effects of mixed project data on learning defect predictors , 2013, Inf. Softw. Technol..

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Audris Mockus,et al.  Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[27]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[28]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[29]  Tihana Galinac Grbac,et al.  Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study , 2017, Appl. Soft Comput..

[30]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[31]  Yuming Zhou,et al.  How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction , 2018, ACM Trans. Softw. Eng. Methodol..

[32]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[33]  Jongmoon Baik,et al.  Effective multi-objective naïve Bayes learning for cross-project defect prediction , 2016, Appl. Soft Comput..

[34]  Burak Turhan,et al.  A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[35]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[36]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[37]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[38]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2018, IEEE Trans. Software Eng..

[39]  Shujuan Jiang,et al.  A feature matching and transfer approach for cross-company defect prediction , 2017, J. Syst. Softw..

[40]  Iñaki Inza,et al.  Learning to classify software defects from crowds: A novel approach , 2018, Appl. Soft Comput..

[41]  Ying Zou,et al.  Data Transformation in Cross-project Defect Prediction , 2017, Empirical Software Engineering.

[42]  Tim Menzies,et al.  Balancing Privacy and Utility in Cross-Company Defect Prediction , 2013, IEEE Transactions on Software Engineering.

[43]  Ahmet Zengin,et al.  How repeated data points affect bug prediction performance: A case study , 2016, Appl. Soft Comput..

[44]  Sousuke Amasaki,et al.  Improving Relevancy Filter Methods for Cross-Project Defect Prediction , 2015, 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence.

[45]  Forrest Shull,et al.  Local versus Global Lessons for Defect Prediction and Effort Estimation , 2013, IEEE Transactions on Software Engineering.

[46]  Ruchika Malhotra,et al.  Comparative analysis of statistical and machine learning methods for predicting faulty modules , 2014, Appl. Soft Comput..