An Empirical Study of the Effects of Expert Knowledge on Bug Reports

Bug reports are crucial software artifacts for both software maintenance researchers and practitioners. A typical use of bug reports by researchers is to evaluate automated software maintenance tools: a large repository of reports is used as input for a tool, and metrics are calculated from the tool's output. But this process is quite different from practitioners, who distinguish between reports written by experts such as programmers, and reports written by non-experts such as users. Practitioners recognize that the content of a bug report depends on its author's expert knowledge. In this paper, we present an empirical study of the textual difference between bug reports written by experts and non-experts. We find that a significance difference exists, and that this difference has a significant impact on the results from a state-of-the-art feature location tool. Our recommendation is that researchers evaluate maintenance tools using different sets of bug reports for experts and non-experts.

[1]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[2]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[3]  Ahmed Tamrawi,et al.  Fuzzy set and cache-based approach for bug triaging , 2011, ESEC/FSE '11.

[4]  Prashant Palvia,et al.  Software maintenance management: Changes in the last decade , 1990, J. Softw. Maintenance Res. Pract..

[5]  Mario Linares Vásquez,et al.  Triaging incoming change requests: Bug or commit history, or code authorship? , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[6]  Daniel M. Germán,et al.  A study of the contributors of PostgreSQL , 2006, MSR '06.

[7]  Denys Poshyvanyk,et al.  Concept location using formal concept analysis and information retrieval , 2012, TSEM.

[8]  Nachiappan Nagappan,et al.  Global Software Servicing: Observational Experiences at Microsoft , 2008, 2008 IEEE International Conference on Global Software Engineering.

[9]  Marco Torchiano,et al.  Impact analysis by means of unstructured knowledge in the context of bug repositories , 2010, ESEM '10.

[10]  Harald C. Gall,et al.  Proceedings of the 2006 international workshop on Mining software repositories , 2006, International Conference on Software Engineering.

[11]  Gerardo Canfora,et al.  Supporting change request assignment in open source development , 2006, SAC.

[12]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[13]  Denys Poshyvanyk,et al.  Journal of Software Maintenance and Evolution: Research and Practice Assigning Change Requests to Software Developers , 2022 .

[14]  Thomas Zimmermann,et al.  Quality of bug reports in Eclipse , 2007, eclipse '07.

[15]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[16]  Stan Jarzabek,et al.  Effective Software Maintenance and Evolution - A Reuse-Based Approach , 2007 .

[17]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[18]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[19]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[21]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[22]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[23]  Premkumar T. Devanbu,et al.  Ownership, experience and defects: a fine-grained study of authorship , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[24]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[25]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[26]  Stanley Letovsky,et al.  Cognitive processes in program comprehension , 1986, J. Syst. Softw..

[27]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[28]  Alexander Egyed,et al.  Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering , 2007, ASE 2007.

[29]  Gregg Rothermel,et al.  An empirical comparison of dynamic impact analysis algorithms , 2004, Proceedings. 26th International Conference on Software Engineering.

[30]  Gail C. Murphy,et al.  Recommending Emergent Teams , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[31]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[32]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[33]  Frank Tip,et al.  Chianti: a change impact analysis tool for java programs , 2005, ICSE.

[34]  Avinash C. Kak,et al.  Retrieval from software libraries for bug localization: a comparative study of generic and composite text models , 2011, MSR '11.

[35]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[36]  Bogdan Dit,et al.  Measuring the Semantic Similarity of Comments in Bug Reports , 2008 .

[37]  Gail C. Murphy,et al.  Determining Implementation Expertise from Bug Reports , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[38]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[39]  Gerardo Canfora,et al.  How Software Repositories can Help in Resolving a New Change Request , 2005 .

[40]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[41]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[42]  Akito Monden,et al.  Analyzing OSS developers' working time using mailing lists archives , 2006, MSR '06.

[43]  Daniel M. Germán,et al.  An empirical study of fine-grained software modifications , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[44]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[45]  Michael Burch,et al.  Visual Data Mining in Software Archives to Detect How Developers Work Together , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[46]  Denys Poshyvanyk,et al.  An exploratory study on assessing feature location techniques , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[47]  Michael English,et al.  An empirical analysis of information retrieval based concept location techniques in software comprehension , 2008, Empirical Software Engineering.

[48]  Gail C. Murphy,et al.  Automatic categorization of bug reports using latent Dirichlet allocation , 2012, ISEC.

[49]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[50]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[51]  Ahmed Tamrawi,et al.  Fuzzy set-based automatic bug triaging: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[52]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[53]  Ching-Yung Lin,et al.  ExpertiseNet: Relational and Evolutionary Expert Modeling , 2005, User Modeling.

[54]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[55]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[56]  Simon Coupland,et al.  A fast and efficient semantic short text similarity metric , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).

[57]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[58]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[59]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[60]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[61]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[62]  Lyndon Hiew,et al.  Assisted Detection of Duplicate Bug Reports , 2006 .

[63]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[64]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[65]  Radoslaw Hofman,et al.  Behavioral economics in software quality engineering , 2011, Empirical Software Engineering.

[66]  Sunghun Kim,et al.  How long did it take to fix bugs? , 2006, MSR '06.

[67]  David Ma,et al.  Expert recommendation with usage expertise , 2009, 2009 IEEE International Conference on Software Maintenance.

[68]  Fabio Crestani,et al.  Proceedings of the 2006 ACM symposium on Applied computing , 2006 .

[69]  Mark S. Ackerman,et al.  Expertise recommender: a flexible recommendation system and architecture , 2000, CSCW '00.

[70]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2010, IEEE Trans. Software Eng..

[71]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[72]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[73]  Denys Poshyvanyk,et al.  Who can help me with this change request? , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[74]  Andrew Begel,et al.  Struggles of new college graduates in their first software development job , 2008, SIGCSE '08.

[75]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[76]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[77]  Ying Zou,et al.  Studying the fix-time for bugs in large open source projects , 2011, Promise '11.

[78]  Xiaoxia Ren,et al.  Chianti: a change impact analysis tool for Java programs , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[79]  Kristina Höök,et al.  CHI '12 Extended Abstracts on Human Factors in Computing Systems , 2012, CHI 2012.

[80]  A. J. Ko Mining whining in support forums with frictionary , 2012, CHI EA '12.

[81]  Bogdan Dit,et al.  An adaptive approach to impact analysis from change requests to source code , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[82]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[83]  Mark Sanderson,et al.  The impact on retrieval effectiveness of skewed frequency distributions , 1999, TOIS.

[84]  Laurie L. Levesque,et al.  Cognitive divergence and shared mental models in software development project teams , 2001 .