Which Is More Important for Cross-Project Defect Prediction: Instance or Feature?

Software defect prediction plays an important role in software testing. We can build the prediction model based on historical data. However, for a new project, we cannot be able to build a good prediction model due to lack of historical data. Therefore, researchers have proposed the cross-project defect prediction (CPDP) methods to share the historical data among different projects. In practice, there may be the problems of instance distribution differences and feature redundancy in cross-project datasets. To investigate which is more important for CPDP, instance or feature, we take instance filter and feature selection as examples to show their efficiency for CPDP. Our experiments are conducted on NASA and PROMISE datasets, and the results indicate that feature selection performs better than instance filter in improving the performance of CPDP. We can conclude that feature could be more important than instance for CPDP.

[1]  David Lo,et al.  An Empirical Study of Classifier Combination for Cross-Project Defect Prediction , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[2]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[3]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[4]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[5]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[6]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[7]  Yue Jiang,et al.  Variance Analysis in Software Fault Prediction Models , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[8]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Akito Monden,et al.  An Ensemble Approach of Simple Regression Models to Cross-Project Fault Prediction , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[10]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[11]  Jongmoon Baik,et al.  Value-cognitive boosting with a support vector machine for cross-project defect prediction , 2014, Empirical Software Engineering.

[12]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[13]  Gilles Pagès,et al.  Approximations of Functions by a Multilayer Perceptron: a New Approach , 1997, Neural Networks.

[14]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[15]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[16]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[17]  Jongmoon Baik,et al.  A transfer cost-sensitive boosting approach for cross-project defect prediction , 2017, Software Quality Journal.

[18]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[19]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[20]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[21]  Jongmoon Baik,et al.  Effective multi-objective naïve Bayes learning for cross-project defect prediction , 2016, Appl. Soft Comput..

[22]  Ying Zou,et al.  Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[23]  Baowen Xu,et al.  Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning , 2015, ESEC/SIGSOFT FSE.

[24]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[25]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[26]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[27]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[28]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[29]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[30]  Hareton K. N. Leung,et al.  An in-depth study of the potentially confounding effect of class size in fault prediction , 2014, TSEM.

[31]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[32]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[33]  Jongmoon Baik,et al.  A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction , 2015, Journal of Computer Science and Technology.