Cross-Project Change-Proneness Prediction

Software change-proneness prediction (whether or not class files in a project will be changed in the next release) can help software developers to focus on preventive actions to reduce maintenance costs, and managers to allocate resources more effectively. Prior studies found that change-proneness prediction works well if there is sufficient amount of training data to build a model. However, it is not feasible for projects with limited historical data especially for new projects. To address this issue, cross-project change-proneness prediction, which builds a prediction model by using data in another project (i.e., source project), and predicts the change-proneness in a target project, is proposed. Considering there are a large number of source projects, one challenge for cross-project change-proneness prediction is that given a target project, how to automatically select a source project which could show good prediction accuracy on it. In this paper, we propose a selective cross-project (SCP) model for change-proneness prediction. SCP automatically finds the source project which has the similar data distribution with the target project by measuring distribution similarity between source and target projects. We evaluate SCP by conducting an empirical study on 14 open source projects. We compare it with 2 most related change-proneness models, including RCP (Random Cross-Project prediction) proposed by Malhotra and Bansal, and CLAMI+ developed by Yan et al. Experiment results show that SCP improves RCP and CLAMI+ by 25.34% and 4.30% in terms of AUC respectively; and by 171.42% and 172.31% in terms of cost-effectiveness, respectively.

[1]  Xinli Yang,et al.  Condensing Class Diagrams With Minimal Manual Labeling Cost , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[2]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[3]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Lionel C. Briand,et al.  Dynamic coupling measurement for object-oriented software , 2004, IEEE Transactions on Software Engineering.

[5]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[6]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[7]  Mikael Lindvall Are large C++ classes change‐prone? An empirical investigation , 1998 .

[8]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[9]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10]  David Lo,et al.  Automating Change-Level Self-Admitted Technical Debt Determination , 2019, IEEE Transactions on Software Engineering.

[11]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[12]  Hongfang Liu,et al.  Identifying and characterizing change-prone classes in two large-scale open-source products , 2007, J. Syst. Softw..

[13]  Ruchika Malhotra,et al.  Investigation of relationship between object-oriented metrics and change proneness , 2013, Int. J. Mach. Learn. Cybern..

[14]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[15]  Steffen Herbold,et al.  Training data selection for cross-project defect prediction , 2013, PROMISE.

[16]  Ling Xu,et al.  Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project , 2016, J. Syst. Softw..

[17]  Irfan Ahmad,et al.  Three empirical studies on predicting software maintainability using ensemble methods , 2015, Soft Comput..

[18]  Jehad Al Dallal Object-oriented class maintainability prediction using internal quality attributes , 2013, Inf. Softw. Technol..

[19]  Ruchika Malhotra,et al.  Prediction & Assessment of Change Prone Classes Using Statistical & Machine Learning Techniques , 2017, J. Inf. Process. Syst..

[20]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[21]  N. Cliff Ordinal methods for behavioral data analysis , 1996 .

[22]  David Lo,et al.  Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[23]  Yuming Zhou,et al.  The ability of object-oriented metrics to predict change-proneness: a meta-analysis , 2011, Empirical Software Engineering.

[24]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Mengning Yang,et al.  Self-learning Change-prone Class Prediction , 2016, SEKE.

[26]  Ling Xu,et al.  Automated change-prone class prediction on unlabeled dataset using unsupervised method , 2017, Inf. Softw. Technol..

[27]  Yuming Zhou,et al.  Examining the Potentially Confounding Effect of Class Size on the Associations between Object-Oriented Metrics and Change-Proneness , 2009, IEEE Transactions on Software Engineering.

[28]  Daniele Romano,et al.  Using source code metrics to predict change-prone Java interfaces , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[29]  Harald C. Gall,et al.  Can we predict types of code changes? An empirical analysis , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[30]  Mehwish Riaz,et al.  A systematic review of software maintainability prediction and metrics , 2009, ESEM 2009.

[31]  Yuming Zhou,et al.  Predicting object-oriented software maintainability using multivariate adaptive regression splines , 2007, J. Syst. Softw..

[32]  Ruchika Malhotra,et al.  Cross project change prediction using open source projects , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[33]  Xinli Yang,et al.  TLEL: A two-layer ensemble learning approach for just-in-time defect prediction , 2017, Inf. Softw. Technol..

[34]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[35]  Shane McIntosh,et al.  Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[36]  Mahmoud O. Elish,et al.  A suite of metrics for quantifying historical changes to predict future change‐prone classes in object‐oriented software , 2013, J. Softw. Evol. Process..

[37]  Ruchika Malhotra,et al.  Examining the effectiveness of machine learning algorithms for prediction of change prone classes , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[38]  Ruchika Malhotra,et al.  An empirical study for software change prediction using imbalanced data , 2017, Empirical Software Engineering.

[39]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[40]  Lionel C. Briand,et al.  Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[41]  James M. Bieman,et al.  OO design patterns, design structure, and program changes: an industrial case study , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[42]  Sinan Eski,et al.  An Empirical Study on Object-Oriented Metrics and Software Evolution in Order to Reduce Testing Costs by Predicting Change-Prone Classes , 2011, 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops.

[43]  David Lo,et al.  File-Level Defect Prediction: Unsupervised vs. Supervised Models , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[44]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[45]  Akif Günes Koru,et al.  Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products , 2005, IEEE Transactions on Software Engineering.

[46]  C. van Koten,et al.  An application of Bayesian network for predicting object-oriented software maintainability , 2006, Inf. Softw. Technol..

[47]  Haruhiko Kaiya,et al.  Adapting a fault prediction model to allow inter languagereuse , 2008, PROMISE '08.

[48]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[49]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[50]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[51]  Ying Fu,et al.  Automated classification of software change messages by semi-supervised Latent Dirichlet Allocation , 2015, Inf. Softw. Technol..

[52]  David Lo,et al.  Predicting Crashing Releases of Mobile Applications , 2016, ESEM.

[53]  James M. Bieman,et al.  Understanding change-proneness in OO software through visualization , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[54]  David Lo,et al.  Combined classifier for cross-project defect prediction: an extended empirical study , 2018, Frontiers of Computer Science.

[55]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[56]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).