论文信息 - Collective Personalized Change Classification With Multiobjective Search

Collective Personalized Change Classification With Multiobjective Search

Many change classification techniques have been proposed to identify defect-prone changes. These techniques consider all developers' historical change data to build a global prediction model. In practice, since developers have their own coding preferences and behavioral patterns, which causes different defect patterns, a separate change classification model for each developer can help to improve performance. Jiang, Tan, and Kim refer to this problem as personalized change classification, and they propose PCC+ to solve this problem. A software project has a number of developers; for a developer, building a prediction model not only based on his/her change data, but also on other relevant developers' change data can further improve the performance of change classification. In this paper, we propose a more accurate technique named collective personalized change classification (CPCC), which leverages a multiobjective genetic algorithm. For a project, CPCC first builds a personalized prediction model for each developer based on his/her historical data. Next, for each developer, CPCC combines these models by assigning different weights to these models with the purpose of maximizing two objective functions (i.e., F1-scores and cost effectiveness). To further improve the prediction accuracy, we propose CPCC+ by combining CPCC with PCC proposed by Jiang, Tan, and Kim To evaluate the benefits of CPCC+ and CPCC, we perform experiments on six large software projects from different communities: Eclipse JDT, Jackrabbit, Linux kernel, Lucene, PostgreSQL, and Xorg. The experiment results show that CPCC+ can discover up to 245 more bugs than PCC+ (468 versus 223 for PostgreSQL) if developers inspect the top 20% lines of code that are predicted buggy. In addition, CPCC+ can achieve F1-scores of 0.60-0.75, which are statistically significantly higher than those of PCC+ on all of the six projects.

[1] David Lo,et al. HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[2] Kalyanmoy Deb,et al. Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[3] N. Cliff. Ordinal methods for behavioral data analysis , 1996 .

[4] Kalyanmoy Deb,et al. Simulated Binary Crossover for Continuous Search Space , 1995, Complex Syst..

[5] Thomas Zimmermann,et al. Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[6] Mario Jino,et al. Diversity oriented test data generation using metaheuristic search techniques , 2014, Inf. Sci..

[7] Lin Tan,et al. Correlations between bugginess and time-based commit characteristics , 2014, Empirical Software Engineering.

[8] Gerardo Canfora,et al. Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[9] Ayse Basar Bener,et al. On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[10] Anh Tuan Nguyen,et al. Multi-layered approach for recovering links between bug reports and fixes , 2012, SIGSOFT FSE.

[11] Kalyanmoy Deb,et al. A combined genetic adaptive search (GeneAS) for engineering design , 1996 .

[12] Guangchun Luo,et al. Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[13] Kalyanmoy Deb,et al. A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[14] Premkumar T. Devanbu,et al. Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[15] David Lo,et al. An Empirical Study of Classifier Combination for Cross-Project Defect Prediction , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[16] Tim Menzies,et al. Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[17] Taghi M. Khoshgoftaar,et al. Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[18] Yi Zhang,et al. Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[19] Mark Harman,et al. Searching for better configurations: a rigorous approach to clone evaluation , 2013, ESEC/FSE 2013.

[20] Michele Lanza,et al. An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[21] Sunghun Kim,et al. Reducing Features to Improve Code Change-Based Bug Prediction , 2013, IEEE Transactions on Software Engineering.

[22] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[23] Sinno Jialin Pan,et al. Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[24] David Lo,et al. Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[25] H. Abdi. The Bonferonni and Šidák Corrections for Multiple Comparisons , 2006 .

[26] Andrea De Lucia,et al. How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[27] S. N. Sivanandam,et al. Introduction to genetic algorithms , 2007 .

[28] Premkumar T. Devanbu,et al. How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[29] Taghi M. Khoshgoftaar,et al. Improving Software-Quality Predictions With Data Sampling and Boosting , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[30] Ahmed E. Hassan,et al. Think locally, act globally: Improving defect and effort prediction models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[31] Mark Harman,et al. Search-based software engineering , 2001, Inf. Softw. Technol..

[32] Madjid Tavana,et al. Multi-objective control chart design optimization using NSGA-III and MOPSO enhanced with DEA and TOPSIS , 2016, Expert Syst. Appl..

[33] Jane Cleland-Huang,et al. Improving trace accuracy through data-driven configuration and composition of tracing features , 2013, ESEC/FSE 2013.

[34] Yoav Freund,et al. The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[35] Harald C. Gall,et al. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[36] Audris Mockus,et al. A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[37] Andrea De Lucia,et al. Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[38] David W. Coit,et al. A two-stage approach for multi-objective decision making with applications to system reliability optimization , 2009, Reliab. Eng. Syst. Saf..

[39] Valery Buzungu,et al. Predicting Fault-prone Components in a Java Legacy System , 2006 .

[40] Tian Jiang,et al. Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[41] Elaine J. Weyuker,et al. Programmer-based fault prediction , 2010, PROMISE '10.

[42] Tim Menzies,et al. Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[43] Premkumar T. Devanbu,et al. Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[44] Xinli Yang,et al. Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[45] David Lo,et al. Identifying Linux bug fixing patches , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[46] David Lo,et al. Tag recommendation in software information sites , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[47] Zhaojun Li,et al. Multi-Objective and Multi-Stage Reliability Growth Planning in Early Product-Development Stage , 2016, IEEE Transactions on Reliability.

[48] Rongxin Wu,et al. Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[49] Rongxin Wu,et al. ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[50] Ayse Basar Bener,et al. Empirical evaluation of the effects of mixed project data on learning defect predictors , 2013, Inf. Softw. Technol..

[51] Yuanyuan Zhang,et al. Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[52] Ahmed Tamrawi,et al. Fuzzy set and cache-based approach for bug triaging , 2011, ESEC/FSE '11.

[53] Jaime Spacco,et al. SZZ revisited: verifying when changes induce fixes , 2008, DEFECTS '08.

[54] David Lo,et al. Empirical Evaluation of Bug Linking , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[55] Qingfu Zhang,et al. MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[56] Yang Feng,et al. Towards more accurate multi-label software behavior learning , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[57] Ayse Basar Bener,et al. Ensemble of software defect predictors: a case study , 2008, ESEM '08.

[58] Nachiappan Nagappan,et al. Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[59] Claire Le Goues,et al. GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[60] R. Suganya,et al. Data Mining Concepts and Techniques , 2010 .

[61] E. James Whitehead,et al. Predicting buggy changes inside an integrated development environment , 2007, eclipse '07.

[62] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[63] David E. Goldberg,et al. Genetic algorithms and Machine Learning , 1988, Machine Learning.

[64] Lionel C. Briand,et al. Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[65] Burak Turhan,et al. Learning Better Inspection Optimization Policies , 2012, Int. J. Softw. Eng. Knowl. Eng..