The Impact of Automated Parameter Optimization on Defect Prediction Models

Defect prediction models—classifiers that identify defect-prone software modules—have configurable parameters that control their characteristics (e.g., the number of trees in a random forest). Recent studies show that these classifiers underperform when default settings are used. In this paper, we study the impact of automated parameter optimization on defect prediction models. Through a case study of 18 datasets, we find that automated parameter optimization: (1) improves AUC performance by up to 40 percentage points; (2) yields classifiers that are at least as stable as those trained using default settings; (3) substantially shifts the importance ranking of variables, with as few as 28 percent of the top-ranked variables in optimized classifiers also being top-ranked in non-optimized classifiers; (4) yields optimized settings for 17 of the 20 most sensitive parameters that transfer among datasets without a statistically significant drop in performance; and (5) adds less than 30 minutes of additional computation to 12 of the 26 studied classification techniques. While widely-used classification techniques like random forest and support vector machines are not optimization-sensitive, traditionally overlooked techniques like C5.0 and neural networks can actually outperform widely-used techniques after optimization is applied. This highlights the importance of exploring the parameter space when using parameter-sensitive classification techniques.

[1]  Ayse Basar Bener,et al.  Ensemble of software defect predictors: a case study , 2008, ESEM '08.

[2]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[3]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[4]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[5]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[6]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[7]  Myra B. Cohen,et al.  Learning Combinatorial Interaction Test Generation Strategies Using Hyperheuristic Search , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[8]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[9]  Shane McIntosh,et al.  Comments on "Researcher Bias: The Use of Machine Learning in Software Defect Prediction" , 2016, IEEE Trans. Software Eng..

[10]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[11]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[12]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[13]  Jens Grabowski,et al.  Global vs. local models for cross-project defect prediction , 2017, Empirical Software Engineering.

[14]  Carolyn Mair,et al.  The consistency of empirical comparisons of regression and analogy-based software project cost prediction , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[15]  Yasutaka Kamei,et al.  The Impact of Using Regression Models to Build Defect Classifiers , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[16]  Ahmed E. Hassan,et al.  A Case Study of Bias in Bug-Fix Datasets , 2010, 2010 17th Working Conference on Reverse Engineering.

[17]  Audris Mockus,et al.  Towards building a universal defect prediction model , 2014, MSR 2014.

[18]  Naoyasu Ubayashi,et al.  An empirical study of just-in-time defect prediction using cross-project models , 2014, MSR 2014.

[19]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[20]  Robert Hooke,et al.  `` Direct Search'' Solution of Numerical and Statistical Problems , 1961, JACM.

[21]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[22]  Wyn L. Price,et al.  A Controlled Random Search Procedure for Global Optimisation , 1977, Comput. J..

[23]  Yan Ma,et al.  Adequate and Precise Evaluation of Quality Models in Software Engineering Studies , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[24]  Jane Cleland-Huang,et al.  Improving trace accuracy through data-driven configuration and composition of tracing features , 2013, ESEC/FSE 2013.

[25]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[26]  Akito Monden,et al.  Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Elaine J. Weyuker,et al.  Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models , 2008, Empirical Software Engineering.

[29]  Ayse Basar Bener,et al.  Reducing false alarms in software defect prediction by decision threshold optimization , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[30]  Ken-ichi Matsumoto,et al.  Comments on “Researcher Bias: The Use of Machine Learning in Software Defect Prediction” , 2016, IEEE Transactions on Software Engineering.

[31]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[32]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[33]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[34]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[35]  Ahmed E. Hassan,et al.  An industrial study on the risk of software changes , 2012, SIGSOFT FSE.

[36]  Ahmed E. Hassan,et al.  Studying the impact of social interactions on software quality , 2012, Empirical Software Engineering.

[37]  James C. Bezdek,et al.  Nearest prototype classification: clustering, genetic algorithms, or random search? , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[38]  Tim Menzies,et al.  Privacy and utility for defect prediction: Experiments with MORPH , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[39]  Andrea De Lucia,et al.  Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[40]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[41]  Filomena Ferrucci,et al.  A Genetic Algorithm to Configure Support Vector Machines for Predicting Fault-Prone Components , 2011, PROFES.

[42]  Audris Mockus,et al.  High-impact defects: a study of breakage and surprise defects , 2011, ESEC/FSE '11.

[43]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[44]  Lucas Layman,et al.  LACE2: Better Privacy-Preserving Data Sharing for Cross Project Defect Prediction , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[45]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[46]  Tim Menzies,et al.  Balancing Privacy and Utility in Cross-Company Defect Prediction , 2013, IEEE Transactions on Software Engineering.

[47]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[48]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[49]  Mark Harman,et al.  Search Based Software Engineering: Techniques, Taxonomy, Tutorial , 2010, LASER Summer School.

[50]  Günther Ruhe,et al.  Search Based Software Engineering , 2013, Lecture Notes in Computer Science.

[51]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[52]  A. Kazdin The meanings and measurement of clinical significance. , 1999, Journal of consulting and clinical psychology.

[53]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[54]  Gordon Fraser,et al.  Parameter tuning or default values? An empirical investigation in search-based software engineering , 2013, Empirical Software Engineering.

[55]  Tim Menzies,et al.  Tuning for Software Analytics: is it Really Necessary? , 2016, Inf. Softw. Technol..

[56]  Chakkrit Tantithamthavorn Towards a Better Understanding of the Impact of Experimental Components on Defect Prediction Modelling , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[57]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[58]  Tracy Hall,et al.  Different Classifiers Find Different Defects Although With Different Level of Consistency , 2015, PROMISE.

[59]  Michael W. Godfrey,et al.  Investigating code review quality: Do people and participation matter? , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[60]  Jonathan F. Bard,et al.  Coordination of a multidivisional organization through two levels of management , 1983 .

[61]  Shane McIntosh,et al.  The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects , 2014, MSR 2014.

[62]  Thilo Mende,et al.  Replication of defect prediction studies: problems, pitfalls and recommendations , 2010, PROMISE '10.

[63]  Ken-ichi Matsumoto,et al.  The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[64]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[65]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[66]  Jonathan F. Bard,et al.  An Efficient Point Algorithm for a Linear Two-Stage Optimization Problem , 1983, Oper. Res..

[67]  Taghi M. Khoshgoftaar,et al.  Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study , 2004, Empirical Software Engineering.

[68]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[69]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[70]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[71]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[72]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[73]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[74]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[75]  Shane McIntosh,et al.  A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[76]  Gregory Levitin,et al.  Robust recurrent neural network modeling for software fault detection and correction prediction , 2007, Reliab. Eng. Syst. Saf..

[77]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[78]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[79]  Foutse Khomh,et al.  Do code review practices impact design quality? A case study of the Qt, VTK, and ITK projects , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[80]  David E. Goldberg,et al.  Parallel Recombinative Simulated Annealing: A Genetic Algorithm , 1995, Parallel Comput..

[81]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[82]  D G Altman,et al.  Statistics notes: Transformations, means, and confidence intervals , 1996, BMJ.

[83]  Lefteris Angelis,et al.  Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm , 2013, IEEE Transactions on Software Engineering.

[84]  Filomena Ferrucci,et al.  A further analysis on the use of Genetic Algorithm to configure Support Vector Machines for inter-release fault prediction , 2012, SAC '12.

[85]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[86]  Audris Mockus,et al.  Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[87]  Ahmed E. Hassan,et al.  The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[88]  Brian D. Ripley,et al.  Feed-Forward Neural Networks and Multinomial Log-Linear Models , 2015 .

[89]  C. Hung,et al.  A Comparative Study on Differential Evolution and Genetic Algorithms for Some Combinatorial Problems , 2009 .

[90]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[91]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[92]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[93]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[94]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[95]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[96]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[97]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[98]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[99]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, PROMISE@ICSE.

[100]  Kaspar Rufibach,et al.  Use of Brier score to assess binary predictions. , 2010, Journal of clinical epidemiology.

[101]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[102]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[103]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[104]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[105]  W. Sauerbrei,et al.  Dangers of using "optimal" cutpoints in the evaluation of prognostic factors. , 1994, Journal of the National Cancer Institute.

[106]  Tsachy Weissman,et al.  Justification of Logarithmic Loss via the Benefit of Side Information , 2014, IEEE Transactions on Information Theory.

[107]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[108]  Stefan Fritsch,et al.  neuralnet: Training of Neural Networks , 2010, R J..

[109]  Kenichi Matsumoto,et al.  A Study of Redundant Metrics in Defect Prediction Datasets , 2016, 2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[110]  Ahmed E. Hassan,et al.  The Impact of Classifier Configuration and Classifier Combination on Bug Localization , 2013, IEEE Transactions on Software Engineering.

[111]  Yue Jiang,et al.  Can data transformation help in the detection of fault-prone modules? , 2008, DEFECTS '08.

[112]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[113]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[114]  Luca Scrucca,et al.  GA: Genetic Algorithms , 2012 .

[115]  Nicolette F de Keizer,et al.  Performance of prognostic models in critically ill cancer patients – a review , 2005, Critical care.

[116]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[117]  Jin Liu,et al.  The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[118]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[119]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[120]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[121]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[122]  Ahmed E. Hassan,et al.  The Impact of Correlated Metrics on Defect Models , 2018, ArXiv.

[123]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[124]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[125]  Tim Menzies,et al.  Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? , 2016, ArXiv.

[126]  P. Gaur Neural networks in data mining , 2018 .

[127]  James C. Bean,et al.  Genetic Algorithms and Random Keys for Sequencing and Optimization , 1994, INFORMS J. Comput..

[128]  Rainer Koschke,et al.  Evaluating Defect Prediction Models for a Large Evolving Software System , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[129]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[130]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[131]  Gordon Fraser,et al.  On Parameter Tuning in Search Based Software Engineering , 2011, SSBSE.

[132]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[133]  Bruce Christianson,et al.  The jinx on the NASA software defect data sets , 2016, EASE.

[134]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[135]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[136]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[137]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[138]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[139]  Tim Menzies,et al.  Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[140]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[141]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[142]  Shane McIntosh,et al.  An empirical study of the impact of modern code review practices on software quality , 2015, Empirical Software Engineering.

[143]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[144]  Hajimu Iida,et al.  Revisiting Code Ownership and Its Relationship with Software Quality in the Scope of Modern Code Review , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[145]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[146]  Ioannis Stamelos,et al.  Software Defect Prediction Using Regression via Classification , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[147]  Michel Tenenhaus,et al.  PLS path modeling , 2005, Comput. Stat. Data Anal..

[148]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.