Cost-aware triage ranking algorithms for bug reporting systems

Bug triaging of deciding whom to fix the bug has been studied actively. However, existing work does not consider varying cost of the same bug over developers with diverse backgrounds and experiences. In clear contrast, we argue the “cost” of one bug can be low for one developer, while high for another. Based on this view, we study an automatic triaging system considering both accuracy and cost. Our preliminary solution, CosTriage, models user-specific experiences and estimated cost on each bug category, obtained from topic modeling, and assigns the bug to the developer who not only can, but also is expected to fix fast. For user-specific cost modeling, we are inspired by recommender system work, of estimating user-specific rating of items, e.g., movies. With this view, existing triaging work of categorizing bugs and assigning developers with experiences in the category falls into content-based recommendation (CBR). However, CBR is well known to cause overspecialization because it recommends only the types of bugs that each developer has solved before. This problem is critical because the experienced developers can become overloaded with bugs they hate to fix, though there exist other categories he can fix faster. CosTriage adopts content-boosted collaborative filtering (CBCF), considering not only similar bugs (content-based) but similar developers (collaborative) for estimating user-specific cost. In this paper, we extend to include special scenarios. First, bug may not have textual report (e.g., crash report) or textual report may lack a topic word (e.g., 1957 of 48,424 in Mozilla reports) Mozilla reports. Second, in some scenarios, developer profiles may change over time. For these scenarios, we extend CosTriage to support non-textual description and dynamic profiles, which we denote as CosTriage+. Our experimental evaluation shows that our solution reduces the cost efficiently by 30 % without seriously compromising accuracy in comparison with the baseline only considering accuracy.

[1]  Yao Zhao,et al.  A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation , 2010, J. Digit. Content Technol. its Appl..

[2]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[3]  Michael R. Lyu,et al.  Effective missing data prediction for collaborative filtering , 2007, SIGIR.

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[6]  Seung-won Hwang,et al.  Adding Examples into Java Documents , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[7]  Sheng Tang,et al.  A density-based method for adaptive LDA model selection , 2009, Neurocomputing.

[8]  Thomas Zimmermann,et al.  Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[9]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[10]  Massimiliano Di Penta,et al.  An approach to classify software maintenance requests , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[11]  John Riedl,et al.  An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms , 2002, Information Retrieval.

[12]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[13]  John Anvik,et al.  Assisting bug report triage through recommendation , 2007 .

[14]  Iulian Neamtiu,et al.  Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging , 2010, 2010 IEEE International Conference on Software Maintenance.

[15]  Seung-won Hwang,et al.  Enriching Documents with Examples: A Corpus Mining Approach , 2013, TOIS.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Sunghun Kim,et al.  How long did it take to fix bugs? , 2006, MSR '06.

[18]  Erik D. Demaine,et al.  An O(n^3)-Time Algorithm for Tree Edit Distance , 2005, ArXiv.

[19]  Weimin Chen,et al.  New Algorithm for Ordered Tree-to-Tree Correction Problem , 2001, J. Algorithms.

[20]  Thomas Zimmermann,et al.  Extracting structural information from bug reports , 2008, MSR '08.

[21]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[22]  Jun Yan,et al.  Automatic Bug Triage using Semi-Supervised Text Classification , 2017, SEKE.

[23]  Seung-won Hwang,et al.  Towards an Intelligent Code Search Engine , 2010, AAAI.

[24]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[25]  Licia Capra,et al.  Temporal diversity in recommender systems , 2010, SIGIR.

[26]  Chao Liu,et al.  An Approach to Improving Bug Assignment with Bug Tossing Graphs and Bug Similarities , 2011, J. Softw..

[27]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[28]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[29]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[30]  Pat Morin,et al.  Output-Sensitive Algorithms for Computing Nearest-Neighbour Decision Boundaries , 2003, WADS.

[31]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[32]  D. Coomans,et al.  Alternative k-nearest neighbour rules in supervised pattern recognition : Part 1. k-Nearest neighbour classification by using alternative voting rules , 1982 .

[33]  Seung-won Hwang,et al.  Instant code clone search , 2010, FSE '10.

[34]  Gina Venolia,et al.  The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[35]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[36]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[37]  Desire L. Massart,et al.  Alternative k-nearest neighbour rules in supervised pattern recognition : Part 3. Condensed nearest neighbour rules , 1982 .

[38]  Seung-won Hwang,et al.  Surfacing code in the dark: an instant clone search approach , 2013, Knowledge and Information Systems.

[39]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[40]  Ahmed Tamrawi,et al.  Fuzzy set and cache-based approach for bug triaging , 2011, ESEC/FSE '11.

[41]  Gerardo Canfora,et al.  Supporting change request assignment in open source development , 2006, SAC.

[42]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[43]  Karina Weron,et al.  A conditionally exponential decay approach to scaling in finance , 1999 .

[44]  Robert Goodell Brown,et al.  Smoothing, forecasting and prediction of discrete time series , 1964 .

[45]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[46]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[47]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[48]  George A. Vouros,et al.  Determining Automatically the Size of Learned Ontologies , 2008, ECAI.

[49]  Pat Morin,et al.  Output-Sensitive Algorithms for Computing Nearest-Neighbour Decision Boundaries , 2005, Discret. Comput. Geom..

[50]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[51]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[52]  M. Narasimha Murty,et al.  On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations , 2010, PAKDD.

[53]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[54]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[55]  Hélène Touzet,et al.  Analysis of Tree Edit Distance Algorithms , 2003, CPM.

[56]  Qing Wang,et al.  An empirical study on bug assignment automation using Chinese bug data , 2009, ESEM 2009.

[57]  Seung-won Hwang,et al.  CosTriage: A Cost-Aware Triage Algorithm for Bug Reporting Systems , 2011, AAAI.