论文信息 - The Impact of Automated Parameter Optimization on Defect Prediction Models

The Impact of Automated Parameter Optimization on Defect Prediction Models

Defect prediction models—classifiers that identify defect-prone software modules—have configurable parameters that control their characteristics (e.g., the number of trees in a random forest). Recent studies show that these classifiers underperform when default settings are used. In this paper, we study the impact of automated parameter optimization on defect prediction models. Through a case study of 18 datasets, we find that automated parameter optimization: (1) improves AUC performance by up to 40 percentage points; (2) yields classifiers that are at least as stable as those trained using default settings; (3) substantially shifts the importance ranking of variables, with as few as 28 percent of the top-ranked variables in optimized classifiers also being top-ranked in non-optimized classifiers; (4) yields optimized settings for 17 of the 20 most sensitive parameters that transfer among datasets without a statistically significant drop in performance; and (5) adds less than 30 minutes of additional computation to 12 of the 26 studied classification techniques. While widely-used classification techniques like random forest and support vector machines are not optimization-sensitive, traditionally overlooked techniques like C5.0 and neural networks can actually outperform widely-used techniques after optimization is applied. This highlights the importance of exploring the parameter space when using parameter-sensitive classification techniques.

[1] Ayse Basar Bener,et al. Ensemble of software defect predictors: a case study , 2008, ESEM '08.

[2] Yue Jiang,et al. Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[3] Gavin Brown,et al. Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[4] Rongxin Wu,et al. Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[5] Frank E. Harrell,et al. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[6] Ayse Basar Bener,et al. Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[7] Myra B. Cohen,et al. Learning Combinatorial Interaction Test Generation Strategies Using Hyperheuristic Search , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[8] Michele Lanza,et al. Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[9] Shane McIntosh,et al. Comments on "Researcher Bias: The Use of Machine Learning in Software Defect Prediction" , 2016, IEEE Trans. Software Eng..

[10] Robert C. Holte,et al. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[11] Rongxin Wu,et al. ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[12] Tracy Hall,et al. Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[13] Jens Grabowski,et al. Global vs. local models for cross-project defect prediction , 2017, Empirical Software Engineering.

[14] Carolyn Mair,et al. The consistency of empirical comparisons of regression and analogy-based software project cost prediction , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[15] Yasutaka Kamei,et al. The Impact of Using Regression Models to Build Defect Classifiers , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[16] Ahmed E. Hassan,et al. A Case Study of Bias in Bug-Fix Datasets , 2010, 2010 17th Working Conference on Reverse Engineering.

[17] Audris Mockus,et al. Towards building a universal defect prediction model , 2014, MSR 2014.

[18] Naoyasu Ubayashi,et al. An empirical study of just-in-time defect prediction using cross-project models , 2014, MSR 2014.

[19] Ingunn Myrtveit,et al. Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[20] Robert Hooke,et al. `` Direct Search'' Solution of Numerical and Statistical Problems , 1961, JACM.

[21] Ahmed E. Hassan,et al. An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[22] Wyn L. Price,et al. A Controlled Random Search Procedure for Global Optimisation , 1977, Comput. J..

[23] Yan Ma,et al. Adequate and Precise Evaluation of Quality Models in Software Engineering Studies , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[24] Jane Cleland-Huang,et al. Improving trace accuracy through data-driven configuration and composition of tracing features , 2013, ESEC/FSE 2013.

[25] Andrea De Lucia,et al. How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[26] Akito Monden,et al. Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[27] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[28] Elaine J. Weyuker,et al. Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models , 2008, Empirical Software Engineering.

[29] Ayse Basar Bener,et al. Reducing false alarms in software defect prediction by decision threshold optimization , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[30] Ken-ichi Matsumoto,et al. Comments on “Researcher Bias: The Use of Machine Learning in Software Defect Prediction” , 2016, IEEE Transactions on Software Engineering.

[31] Xiuzhen Zhang,et al. Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[32] Sotiris B. Kotsiantis,et al. Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[33] Qinbao Song,et al. Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[34] J. Concato,et al. A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[35] Ahmed E. Hassan,et al. An industrial study on the risk of software changes , 2012, SIGSOFT FSE.

[36] Ahmed E. Hassan,et al. Studying the impact of social interactions on software quality , 2012, Empirical Software Engineering.

[37] James C. Bezdek,et al. Nearest prototype classification: clustering, genetic algorithms, or random search? , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[38] Tim Menzies,et al. Privacy and utility for defect prediction: Experiments with MORPH , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[39] Andrea De Lucia,et al. Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[40] Magne Jørgensen,et al. A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[41] Filomena Ferrucci,et al. A Genetic Algorithm to Configure Support Vector Machines for Predicting Fault-Prone Components , 2011, PROFES.

[42] Audris Mockus,et al. High-impact defects: a study of breakage and surprise defects , 2011, ESEC/FSE '11.

[43] Senén Barro,et al. Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[44] Lucas Layman,et al. LACE2: Better Privacy-Preserving Data Sharing for Cross Project Defect Prediction , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[45] Shane McIntosh,et al. Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[46] Tim Menzies,et al. Balancing Privacy and Utility in Cross-Company Defect Prediction , 2013, IEEE Transactions on Software Engineering.

[47] Nikunj C. Oza,et al. Online Ensemble Learning , 2000, AAAI/IAAI.

[48] B. Efron. Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[49] Mark Harman,et al. Search Based Software Engineering: Techniques, Taxonomy, Tutorial , 2010, LASER Summer School.

[50] Günther Ruhe,et al. Search Based Software Engineering , 2013, Lecture Notes in Computer Science.

[51] Sashank Dara,et al. Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[52] A. Kazdin. The meanings and measurement of clinical significance. , 1999, Journal of consulting and clinical psychology.

[53] Thomas J. Ostrand,et al. \{PROMISE\} Repository of empirical software engineering data , 2007 .

[54] Gordon Fraser,et al. Parameter tuning or default values? An empirical investigation in search-based software engineering , 2013, Empirical Software Engineering.

[55] Tim Menzies,et al. Tuning for Software Analytics: is it Really Necessary? , 2016, Inf. Softw. Technol..

[56] Chakkrit Tantithamthavorn. Towards a Better Understanding of the Impact of Experimental Components on Defect Prediction Modelling , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[57] Edward R. Dougherty,et al. Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[58] Tracy Hall,et al. Different Classifiers Find Different Defects Although With Different Level of Consistency , 2015, PROMISE.

[59] Michael W. Godfrey,et al. Investigating code review quality: Do people and participation matter? , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[60] Jonathan F. Bard,et al. Coordination of a multidivisional organization through two levels of management , 1983 .

[61] Shane McIntosh,et al. The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects , 2014, MSR 2014.

[62] Thilo Mende,et al. Replication of defect prediction studies: problems, pitfalls and recommendations , 2010, PROMISE '10.

[63] Ken-ichi Matsumoto,et al. The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[64] Andreas Zeller,et al. It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[65] Tracy Hall,et al. A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[66] Jonathan F. Bard,et al. An Efficient Point Algorithm for a Linear Two-Stage Optimization Problem , 1983, Oper. Res..

[67] Taghi M. Khoshgoftaar,et al. Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study , 2004, Empirical Software Engineering.

[68] Max Kuhn,et al. Building Predictive Models in R Using the caret Package , 2008 .

[69] Magne Jørgensen,et al. A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[70] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .

[71] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[72] David E. Goldberg,et al. Genetic algorithms and Machine Learning , 1988, Machine Learning.

[73] Tian Jiang,et al. Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[74] Mark Goadrich,et al. The relationship between Precision-Recall and ROC curves , 2006, ICML.

[75] Shane McIntosh,et al. A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[76] Gregory Levitin,et al. Robust recurrent neural network modeling for software fault detection and correction prediction , 2007, Reliab. Eng. Syst. Saf..

[77] Xin Yao,et al. Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[78] A. Zeller,et al. Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[79] Foutse Khomh,et al. Do code review practices impact design quality? A case study of the Qt, VTK, and ITK projects , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[80] David E. Goldberg,et al. Parallel Recombinative Simulated Annealing: A Genetic Algorithm , 1995, Parallel Comput..

[81] Tim Menzies,et al. Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[82] D G Altman,et al. Statistics notes: Transformations, means, and confidence intervals , 1996, BMJ.

[83] Lefteris Angelis,et al. Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm , 2013, IEEE Transactions on Software Engineering.

[84] Filomena Ferrucci,et al. A further analysis on the use of Genetic Algorithm to configure Support Vector Machines for inter-release fault prediction , 2012, SAC '12.

[85] Premkumar T. Devanbu,et al. How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[86] Audris Mockus,et al. Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[87] Ahmed E. Hassan,et al. The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[88] Brian D. Ripley,et al. Feed-Forward Neural Networks and Multinomial Log-Linear Models , 2015 .

[89] C. Hung,et al. A Comparative Study on Differential Evolution and Genetic Algorithms for Some Combinatorial Problems , 2009 .

[90] Bojan Cukic,et al. Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[91] Seetha Hari,et al. Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[92] J. Freidman,et al. Multivariate adaptive regression splines , 1991 .

[93] Mark Harman,et al. The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[94] Lionel C. Briand,et al. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[95] Jiawei Han,et al. Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[96] Bart Baesens,et al. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[97] Rainer Storn,et al. Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[98] Max Kuhn,et al. caret: Classification and Regression Training , 2015 .

[99] Hongfang Liu,et al. An investigation of the effect of module size on defect prediction using static measures , 2005, PROMISE@ICSE.

[100] Kaspar Rufibach,et al. Use of Brier score to assess binary predictions. , 2010, Journal of clinical epidemiology.

[101] Premkumar T. Devanbu,et al. Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[102] Shane McIntosh,et al. An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[103] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[104] Jacob Cohen,et al. A power primer. , 1992, Psychological bulletin.

[105] W. Sauerbrei,et al. Dangers of using "optimal" cutpoints in the evaluation of prognostic factors. , 1994, Journal of the National Cancer Institute.

[106] Tsachy Weissman,et al. Justification of Logarithmic Loss via the Benefit of Side Information , 2014, IEEE Transactions on Information Theory.

[107] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[108] Stefan Fritsch,et al. neuralnet: Training of Neural Networks , 2010, R J..

[109] Kenichi Matsumoto,et al. A Study of Redundant Metrics in Defect Prediction Datasets , 2016, 2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[110] Ahmed E. Hassan,et al. The Impact of Classifier Configuration and Classifier Combination on Bug Localization , 2013, IEEE Transactions on Software Engineering.

[111] Yue Jiang,et al. Can data transformation help in the detection of fault-prone modules? , 2008, DEFECTS '08.

[112] Guangchun Luo,et al. Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[113] Premkumar T. Devanbu,et al. Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[114] Luca Scrucca,et al. GA: Genetic Algorithms , 2012 .

[115] Nicolette F de Keizer,et al. Performance of prognostic models in critically ill cancer patients – a review , 2005, Critical care.

[116] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[117] Jin Liu,et al. The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[118] Rainer Koschke,et al. Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[119] E. Steyerberg. Clinical Prediction Models , 2008, Statistics for Biology and Health.

[120] Premkumar T. Devanbu,et al. The missing links: bugs and bug-fix commits , 2010, FSE '10.

[121] Qinbao Song,et al. A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[122] Ahmed E. Hassan,et al. The Impact of Correlated Metrics on Defect Models , 2018, ArXiv.

[123] Michele Lanza,et al. An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[124] Lech Madeyski,et al. Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[125] Tim Menzies,et al. Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? , 2016, ArXiv.

[126] P. Gaur. Neural networks in data mining , 2018 .

[127] James C. Bean,et al. Genetic Algorithms and Random Keys for Sequencing and Optimization , 1994, INFORMS J. Comput..

[128] Rainer Koschke,et al. Evaluating Defect Prediction Models for a Large Evolving Software System , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[129] C.J.H. Mann,et al. Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[130] Leo Breiman,et al. Bias, Variance , And Arcing Classifiers , 1996 .

[131] Gordon Fraser,et al. On Parameter Tuning in Search Based Software Engineering , 2011, SSBSE.

[132] G. Brier. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[133] Bruce Christianson,et al. The jinx on the NASA software defect data sets , 2016, EASE.

[134] Shane McIntosh,et al. Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[135] Charles X. Ling,et al. Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[136] Lionel C. Briand,et al. Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[137] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .

[138] Jacob Cohen. Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[139] Tim Menzies,et al. Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[140] Xin Yao,et al. The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[141] Mohammad Alshayeb,et al. Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[142] Shane McIntosh,et al. An empirical study of the impact of modern code review practices on software quality , 2015, Empirical Software Engineering.

[143] N. Obuchowski,et al. Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[144] Hajimu Iida,et al. Revisiting Code Ownership and Its Relationship with Software Quality in the Scope of Modern Code Review , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[145] Tim Menzies,et al. Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[146] Ioannis Stamelos,et al. Software Defect Prediction Using Regression via Classification , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[147] Michel Tenenhaus,et al. PLS path modeling , 2005, Comput. Stat. Data Anal..

[148] Premkumar T. Devanbu,et al. Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.