Collective Personalized Change Classification With Multiobjective Search

Many change classification techniques have been proposed to identify defect-prone changes. These techniques consider all developers' historical change data to build a global prediction model. In practice, since developers have their own coding preferences and behavioral patterns, which causes different defect patterns, a separate change classification model for each developer can help to improve performance. Jiang, Tan, and Kim refer to this problem as personalized change classification, and they propose PCC+ to solve this problem. A software project has a number of developers; for a developer, building a prediction model not only based on his/her change data, but also on other relevant developers' change data can further improve the performance of change classification. In this paper, we propose a more accurate technique named collective personalized change classification (CPCC), which leverages a multiobjective genetic algorithm. For a project, CPCC first builds a personalized prediction model for each developer based on his/her historical data. Next, for each developer, CPCC combines these models by assigning different weights to these models with the purpose of maximizing two objective functions (i.e., F1-scores and cost effectiveness). To further improve the prediction accuracy, we propose CPCC+ by combining CPCC with PCC proposed by Jiang, Tan, and Kim To evaluate the benefits of CPCC+ and CPCC, we perform experiments on six large software projects from different communities: Eclipse JDT, Jackrabbit, Linux kernel, Lucene, PostgreSQL, and Xorg. The experiment results show that CPCC+ can discover up to 245 more bugs than PCC+ (468 versus 223 for PostgreSQL) if developers inspect the top 20% lines of code that are predicted buggy. In addition, CPCC+ can achieve F1-scores of 0.60-0.75, which are statistically significantly higher than those of PCC+ on all of the six projects.

[1]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[2]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[3]  N. Cliff Ordinal methods for behavioral data analysis , 1996 .

[4]  Kalyanmoy Deb,et al.  Simulated Binary Crossover for Continuous Search Space , 1995, Complex Syst..

[5]  Thomas Zimmermann,et al.  Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[6]  Mario Jino,et al.  Diversity oriented test data generation using metaheuristic search techniques , 2014, Inf. Sci..

[7]  Lin Tan,et al.  Correlations between bugginess and time-based commit characteristics , 2014, Empirical Software Engineering.

[8]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[9]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[10]  Anh Tuan Nguyen,et al.  Multi-layered approach for recovering links between bug reports and fixes , 2012, SIGSOFT FSE.

[11]  Kalyanmoy Deb,et al.  A combined genetic adaptive search (GeneAS) for engineering design , 1996 .

[12]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[13]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[14]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[15]  David Lo,et al.  An Empirical Study of Classifier Combination for Cross-Project Defect Prediction , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[16]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[17]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[18]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[19]  Mark Harman,et al.  Searching for better configurations: a rigorous approach to clone evaluation , 2013, ESEC/FSE 2013.

[20]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[21]  Sunghun Kim,et al.  Reducing Features to Improve Code Change-Based Bug Prediction , 2013, IEEE Transactions on Software Engineering.

[22]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[23]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[24]  David Lo,et al.  Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[25]  H. Abdi The Bonferonni and Šidák Corrections for Multiple Comparisons , 2006 .

[26]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[27]  S. N. Sivanandam,et al.  Introduction to genetic algorithms , 2007 .

[28]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[29]  Taghi M. Khoshgoftaar,et al.  Improving Software-Quality Predictions With Data Sampling and Boosting , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[30]  Ahmed E. Hassan,et al.  Think locally, act globally: Improving defect and effort prediction models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[31]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[32]  Madjid Tavana,et al.  Multi-objective control chart design optimization using NSGA-III and MOPSO enhanced with DEA and TOPSIS , 2016, Expert Syst. Appl..

[33]  Jane Cleland-Huang,et al.  Improving trace accuracy through data-driven configuration and composition of tracing features , 2013, ESEC/FSE 2013.

[34]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[35]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[36]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[37]  Andrea De Lucia,et al.  Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[38]  David W. Coit,et al.  A two-stage approach for multi-objective decision making with applications to system reliability optimization , 2009, Reliab. Eng. Syst. Saf..

[39]  Valery Buzungu,et al.  Predicting Fault-prone Components in a Java Legacy System , 2006 .

[40]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[41]  Elaine J. Weyuker,et al.  Programmer-based fault prediction , 2010, PROMISE '10.

[42]  Tim Menzies,et al.  Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[43]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[44]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[45]  David Lo,et al.  Identifying Linux bug fixing patches , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[46]  David Lo,et al.  Tag recommendation in software information sites , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[47]  Zhaojun Li,et al.  Multi-Objective and Multi-Stage Reliability Growth Planning in Early Product-Development Stage , 2016, IEEE Transactions on Reliability.

[48]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[49]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[50]  Ayse Basar Bener,et al.  Empirical evaluation of the effects of mixed project data on learning defect predictors , 2013, Inf. Softw. Technol..

[51]  Yuanyuan Zhang,et al.  Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[52]  Ahmed Tamrawi,et al.  Fuzzy set and cache-based approach for bug triaging , 2011, ESEC/FSE '11.

[53]  Jaime Spacco,et al.  SZZ revisited: verifying when changes induce fixes , 2008, DEFECTS '08.

[54]  David Lo,et al.  Empirical Evaluation of Bug Linking , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[55]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[56]  Yang Feng,et al.  Towards more accurate multi-label software behavior learning , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[57]  Ayse Basar Bener,et al.  Ensemble of software defect predictors: a case study , 2008, ESEM '08.

[58]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[59]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[60]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[61]  E. James Whitehead,et al.  Predicting buggy changes inside an integrated development environment , 2007, eclipse '07.

[62]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[63]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[64]  Lionel C. Briand,et al.  Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[65]  Burak Turhan,et al.  Learning Better Inspection Optimization Policies , 2012, Int. J. Softw. Eng. Knowl. Eng..