Assisting bug report triage through recommendation

A key collaborative hub for many software development projects is the issue tracking system, or bug repository. The use of a bug repository can improve the software development process in a number of ways including allowing developers who are geographically distributed to communicate about project development. However, reports added to the repository need to be triaged by a human, called the triager, to determine if reports are meaningful. If a report is meaningful, the triager decides how to organize the report for integration into the project's development process. Triagers can become overwhelmed by the number of reports added to the repository. Time spent triaging also typically diverts valuable resources away from the improvement of the product to the managing of the development process. To assist triagers, this talk presents a machine learning approach to create recommenders that assists with one common triager decision: the assignment of the report to a developer. The recommenders created with this approach are accurate: recommenders for which developer to assign a report have a precision of 70% to 98% over five open source projects. In addition, we present an approach to assist project members to specify the project-specific values for creating a developer recommender and show that such a recommender can be created with a subset of the repository data. Biography: Dr. John Anvik's research focuses on reducing the management overhead in software development. Currently he is looking at how to improve the management of bug reports. His career has spanned both academia and industry having been a researcher at several universities, taught undergraduate classes, worked as a software developer and as a training manager for a small web-GIS company. He currently lives in Victoria, BC, Canada. C O M P U T ER S C I E N C E D E P A R T M E N T C O L L O Q U I U M

[1]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Lynn A. Streeter,et al.  Who Knows: A System Based on Automatic Representation of Semantic Structure , 1988, RIAO Conference.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Peter H. Carstensen,et al.  Let's Talk About Bugs! , 1995, Scand. J. Inf. Syst..

[8]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[9]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Yasubumi Sakakibara,et al.  Recent Advances of Grammatical Inference , 1997, Theor. Comput. Sci..

[13]  Kevin Crowston,et al.  A Coordination Theory Approach to Organizational Process Design , 1997 .

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[16]  Mark T. Maybury,et al.  Enterprise expert and knowledge discovery , 1999, HCI.

[17]  Richard C. Holt,et al.  Reconstructing ownership architectures to help understand software systems , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[18]  Sandro Morasca,et al.  A hybrid approach to analyze empirical software engineering data and its application to predict module fault-proneness in maintenance , 2000, J. Syst. Softw..

[19]  Mark S. Ackerman,et al.  Expertise recommender: a flexible recommendation system and architecture , 2000, CSCW '00.

[20]  P. Salus The Cathedral and the Bazaar , 2000 .

[21]  Jeffrey O. Kephart,et al.  Incremental Learning in SwiftFile , 2000, ICML.

[22]  David W. McDonald,et al.  Evaluating expertise recommendations , 2001, GROUP.

[23]  Christian Robottom Reis,et al.  An Overview of the Software Engineering Process and Tools in the Mozilla Project , 2002 .

[24]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[25]  Massimiliano Di Penta,et al.  An approach to classify software maintenance requests , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[26]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[27]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[28]  Stefan Koch,et al.  Effort, co‐operation and co‐ordination in an open source software project: GNOME , 2002, Inf. Syst. J..

[29]  Harald C. Gall,et al.  Analyzing and relating bug report data for feature tracking , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[30]  Alfred Kobsa,et al.  Expert-Finding Systems for Organizations: Problem and Domain Analysis and the DEMOIR Approach , 2003, J. Organ. Comput. Electron. Commer..

[31]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[32]  L. Gasser,et al.  Distributed Collective Practices and Free / Open-Source Software Problem Management : Perspectives and Methods , 2003 .

[33]  Maarten Sierhuis,et al.  Management of interdependencies in collaborative software development , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[34]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[35]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[36]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[37]  Gregorio Robles,et al.  Remote analysis and measurement of libre software systems by means of the CVSAnalY tool , 2004, ICSE 2004.

[38]  Les Gasser,et al.  Bug Report Networks: Varieties, Strategies, and Impacts in a F/OSS Development Community , 2004, MSR.

[39]  Yuriy Brun,et al.  Finding latent code errors via machine learning over program executions , 2004, Proceedings. 26th International Conference on Software Engineering.

[40]  Chadd C. Williams,et al.  Bug Driven Bug Finders , 2004, MSR.

[41]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[42]  George Forman,et al.  Learning from Little: Comparison of Classifiers Given Little Training , 2004, PKDD.

[43]  Kevin Crowston,et al.  Coordination practices within FLOSS development teams: The bug fixing process , 2004, Computer Supported Acitivity Coordination.

[44]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[45]  Gerardo Canfora,et al.  Impact analysis by mining software and change request repositories , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[46]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[47]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[48]  James M. Bieman,et al.  The FreeBSD project: a replication case study of open source development , 2005, IEEE Transactions on Software Engineering.

[49]  Mik Kersten,et al.  Mylar: a degree-of-interest model for IDEs , 2005, AOSD '05.

[50]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[51]  Kevin Crowston,et al.  The social structure of free and open source software development , 2005, First Monday.

[52]  Ching-Yung Lin,et al.  ExpertiseNet: Relational and Evolutionary Expert Modeling , 2005, User Modeling.

[53]  Les Gasser,et al.  Negotiation and the coordination of information and activity in distributed software problem management , 2005, GROUP.

[54]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[55]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[56]  Kevin Crowston,et al.  Information systems success in free and open source software development: theory and measures , 2006, Softw. Process. Improv. Pract..

[57]  Lyndon Hiew,et al.  Assisted Detection of Duplicate Bug Reports , 2006 .

[58]  Gerardo Canfora,et al.  Supporting change request assignment in open source development , 2006, SAC.

[59]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[60]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[61]  Robert DeLine,et al.  Information Needs in Collocated Software Development Teams , 2007, 29th International Conference on Software Engineering (ICSE'07).

[62]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[63]  Gail C. Murphy,et al.  Determining Implementation Expertise from Bug Reports , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[64]  Gail C. Murphy,et al.  Recommending Emergent Teams , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[65]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[66]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.