Understanding the Automated Parameter Optimization on Transfer Learning for Cross-Project Defect Prediction: An Empirical Study

Data-driven defect prediction has become increasingly important in software engineering process. Since it is not uncommon that data from a software project is insufficient for training a reliable defect prediction model, transfer learning that borrows data/konwledge from other projects to facilitate the model building at the current project, namely cross-project defect prediction (CPDP), is naturally plausible. Most CPDP techniques involve two major steps, i.e., transfer learning and classification, each of which has at least one parameter to be tuned to achieve their optimal performance. This practice fits well with the purpose of automated parameter optimization. However, there is a lack of thorough understanding about what are the impacts of automated parameter optimization on various CPDP techniques. In this paper, we present the first empirical study that looks into such impacts on 62 CPDP techniques, 13 of which are chosen from the existing CPDP literature while the other 49 ones have not been explored before. We build defect prediction models over 20 real-world software projects that are of different scales and characteristics. Our findings demonstrate that: (1) Automated parameter optimization substantially improves the defect prediction performance of 77% CPDP techniques with a manageable computational cost. Thus more efforts on this aspect are required in future CPDP studies. (2) Transfer learning is of ultimate importance in CPDP. Given a tight computational budget, it is more cost-effective to focus on optimizing the parameter configuration of transfer learning algorithms (3) The research on CPDP is far from mature where it is ‘not difficult’ to find a better alternative by making a combination of existing transfer learning and classification techniques. This finding provides important insights about the future design of CPDP techniques.

[1]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[2]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[3]  Shane McIntosh,et al.  The Impact of Automated Parameter Optimization on Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[4]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[5]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[6]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[7]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[8]  Steffen Herbold,et al.  A systematic mapping study on cross-project defect prediction , 2017, ArXiv.

[9]  Xiao Liu,et al.  An empirical study on software defect prediction with a simplified metric set , 2014, Inf. Softw. Technol..

[10]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[11]  David Lo,et al.  An Empirical Study of Classifier Combination for Cross-Project Defect Prediction , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[12]  Sousuke Amasaki,et al.  Improving Cross-Project Defect Prediction Methods with Data Simplification , 2015, 2015 41st Euromicro Conference on Software Engineering and Advanced Applications.

[13]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[14]  Steffen Herbold,et al.  Training data selection for cross-project defect prediction , 2013, PROMISE.

[15]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[16]  Burak Turhan,et al.  A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[17]  Andrea De Lucia,et al.  Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[18]  Koichiro Ochimizu,et al.  Towards logistic regression models for predicting fault-prone code across software projects , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[19]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[20]  Qing Sun,et al.  Software defect prediction via transfer learning based neural network , 2015, 2015 First International Conference on Reliability Systems Engineering (ICRSE).

[21]  Zhaowei Shang,et al.  Negative samples reduction in cross-company software defects prediction , 2015, Inf. Softw. Technol..

[22]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[23]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[24]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[25]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[26]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[27]  Xiao-Yuan Jing,et al.  Progress on approaches to software defect prediction , 2018, IET Softw..

[28]  Bin Liu,et al.  Transfer-Learning Oriented Class Imbalance Learning for Cross-Project Defect Prediction , 2019, ArXiv.

[29]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[30]  Jongmoon Baik,et al.  Value-cognitive boosting with a support vector machine for cross-project defect prediction , 2014, Empirical Software Engineering.

[31]  Audris Mockus,et al.  Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[32]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[33]  S. Sawilowsky New Effect Size Rules of Thumb , 2009 .

[34]  Burak Turhan,et al.  Search Based Training Data Selection For Cross Project Defect Prediction , 2016, PROMISE.

[35]  Thilo Mende,et al.  Replication of defect prediction studies: problems, pitfalls and recommendations , 2010, PROMISE '10.

[36]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[37]  Hoh Peter In,et al.  Micro interaction metrics for defect prediction , 2011, ESEC/FSE '11.

[38]  BengioYoshua,et al.  Random search for hyper-parameter optimization , 2012 .

[39]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[40]  Sousuke Amasaki,et al.  Improving Relevancy Filter Methods for Cross-Project Defect Prediction , 2015, 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence.

[41]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[42]  Tim Menzies,et al.  Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[43]  Jongmoon Baik,et al.  A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction , 2015, Journal of Computer Science and Technology.

[44]  Fumio Akiyama,et al.  An Example of Software System Debugging , 1971, IFIP Congress.

[45]  Di Chen,et al.  How to “DODGE” Complex Software Analytics , 2019, IEEE Transactions on Software Engineering.

[46]  Aaron Klein,et al.  Hyperparameter Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[47]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[48]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[49]  Haruhiko Kaiya,et al.  Adapting a fault prediction model to allow inter languagereuse , 2008, PROMISE '08.

[50]  Foutse Khomh,et al.  Predicting Bugs Using Antipatterns , 2013, 2013 IEEE International Conference on Software Maintenance.

[51]  Akito Monden,et al.  An Ensemble Approach of Simple Regression Models to Cross-Project Fault Prediction , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[52]  ZhouYuming,et al.  How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction , 2018 .

[53]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[54]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[55]  Jens Grabowski,et al.  Global vs. local models for cross-project defect prediction , 2017, Empirical Software Engineering.

[56]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[57]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[58]  Fernando Brito e Abreu,et al.  Candidate metrics for object-oriented software within a taxonomy framework , 1994, J. Syst. Softw..

[59]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[60]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[61]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[62]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[63]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[64]  Premkumar T. Devanbu,et al.  BugCache for inspections: hit or miss? , 2011, ESEC/FSE '11.

[65]  Lefteris Angelis,et al.  Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm , 2013, IEEE Transactions on Software Engineering.

[66]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[67]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[68]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[69]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..