Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging

Software bugs are inevitable and bug fixing is a difficult, expensive, and lengthy process. One of the primary reasons why bug fixing takes so long is the difficulty of accurately assigning a bug to the most competent developer for that bug kind or bug class. Assigning a bug to a potential developer, also known as bug triaging, is a labor-intensive, time-consuming and fault-prone process if done manually. Moreover, bugs frequently get reassigned to multiple developers before they are resolved, a process known as bug tossing. Researchers have proposed automated techniques to facilitate bug triaging and reduce bug tossing using machine learning-based prediction and tossing graphs. While these techniques achieve good prediction accuracy for triaging and reduce tossing paths, they are vulnerable to several issues: outdated training sets, inactive developers, and imprecise, single-attribute tossing graphs. In this paper we improve triaging accuracy and reduce tossing path lengths by employing several techniques such as refined classification using additional attributes and intra-fold updates during training, a precise ranking function for recommending potential tossees in tossing graphs, and multi-feature tossing graphs. We validate our approach on two large software projects, Mozilla and Eclipse, covering 856,259 bug reports and 21 cumulative years of development. We demonstrate that our techniques can achieve up to 83.62% prediction accuracy in bug triaging. Moreover, we reduce tossing path lengths to 1.5–2 tosses for most bugs, which represents a reduction of up to 86.31% compared to original tossing paths. Our improvements have the potential to significantly reduce the bug fixing effort, especially in the context of sizable projects with large numbers of testers and developers.

[1]  Massimiliano Di Penta,et al.  An approach to classify software maintenance requests , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[2]  Jeffrey O. Kephart,et al.  Incremental Learning in SwiftFile , 2000, ICML.

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[5]  Gregory Tassey,et al.  Prepared for what , 2007 .

[6]  Qing Wang,et al.  An empirical study on bug assignment automation using Chinese bug data , 2009, ESEM 2009.

[7]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[8]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[11]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[12]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[13]  Grace A. Lewis,et al.  Modernizing Legacy Systems - Software Technologies, Engineering Processes, and Business Practices , 2003, SEI series in software engineering.

[14]  Ian Witten,et al.  Data Mining , 2000 .

[15]  Gerardo Canfora,et al.  Supporting change request assignment in open source development , 2006, SAC.

[16]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.