Automated Parameter Optimization of Classification Techniques for Defect Prediction Models

Defect prediction models are classifiers that are trained to identify defect-prone software modules. Such classifiers have configurable parameters that control their characteristics (e.g., the number of trees in a random forest classifier). Recent studies show that these classifiers may underperform due to the use of suboptimal default parameter settings. However, it is impractical to assess all of the possible settings in the parameter spaces. In this paper, we investigate the performance of defect prediction models where Caret — an automated parameter optimization technique - has been applied. Through a case study of 18 datasets from systems that span both proprietary and open source domains, we find that (1) Caret improves the AUC performance of defect prediction models by as much as 40 percentage points; (2) Caret-optimized classifiers are at least as stable as (with 35% of them being more stable than) classifiers that are trained using the default settings; and (3) Caret increases the likelihood of producing a top-performing classifier by as much as 83%. Hence, we conclude that parameter settings can indeed have a large impact on the performance of defect prediction models, suggesting that researchers should experiment with the parameters of the classification techniques. Since automated parameter optimization techniques like Caret yield substantially benefits in terms of performance improvement and stability, while incurring a manageable additional computational cost, they should be included in future defect prediction studies.

[1]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[2]  Brian D. Ripley,et al.  Feed-Forward Neural Networks and Multinomial Log-Linear Models , 2015 .

[3]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[4]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[5]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[6]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[7]  TurhanBurak On the dataset shift problem in software engineering prediction models , 2012 .

[8]  Alain Abran,et al.  A systematic literature review: Opinion mining studies from mobile app store user reviews , 2017, J. Syst. Softw..

[9]  Ayse Basar Bener,et al.  Reducing false alarms in software defect prediction by decision threshold optimization , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[10]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[13]  Tim Menzies,et al.  Finding conclusion stability for selecting the best effort predictor in software effort estimation , 2012, Automated Software Engineering.

[14]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[15]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[16]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[17]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[18]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[19]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[20]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[21]  Rainer Koschke,et al.  Evaluating Defect Prediction Models for a Large Evolving Software System , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[22]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[23]  Nachiappan Nagappan,et al.  Predicting Subsystem Failures using Dependency Graph Complexities , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[24]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[25]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[26]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[27]  Stefan Fritsch,et al.  neuralnet: Training of Neural Networks , 2010, R J..

[28]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[29]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[30]  References , 1971 .

[31]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[32]  Yue Jiang,et al.  Can data transformation help in the detection of fault-prone modules? , 2008, DEFECTS '08.

[33]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, PROMISE@ICSE.

[34]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[35]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[36]  Harald C. Gall,et al.  Cross-project Defect Prediction , 2009 .

[37]  Myra B. Cohen,et al.  Learning Combinatorial Interaction Test Generation Strategies Using Hyperheuristic Search , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[38]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[39]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[40]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[41]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[42]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[43]  Mark Harman,et al.  Search Based Software Engineering: Techniques, Taxonomy, Tutorial , 2010, LASER Summer School.

[44]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[45]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[46]  Chakkrit Tantithamthavorn Towards a Better Understanding of the Impact of Experimental Components on Defect Prediction Modelling , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[47]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[48]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[49]  Leo Breiman,et al.  Big Random Forests: Classification and Regression Forests forLarge Data Sets , 2014 .

[50]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[51]  S. K. Lo,et al.  Reliability and Validity , 2020, International Encyclopedia of Human Geography.

[52]  Ioannis Stamelos,et al.  Software Defect Prediction Using Regression via Classification , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[53]  Ken-ichi Matsumoto,et al.  The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[54]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[55]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[56]  Lefteris Angelis,et al.  Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm , 2013, IEEE Transactions on Software Engineering.

[57]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[58]  Thilo Mende,et al.  Replication of defect prediction studies: problems, pitfalls and recommendations , 2010, PROMISE '10.

[59]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[60]  Elaine J. Weyuker,et al.  Comparing negative binomial and recursive partitioning models for fault prediction , 2008, PROMISE '08.

[61]  Shin Yoo,et al.  Search-Based Software Engineering , 2014, Lecture Notes in Computer Science.

[62]  Ken-ichi Matsumoto,et al.  Comments on “Researcher Bias: The Use of Machine Learning in Software Defect Prediction” , 2016, IEEE Transactions on Software Engineering.

[63]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[64]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[65]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[66]  Burak Turhan,et al.  On the dataset shift problem in software engineering prediction models , 2011, Empirical Software Engineering.