Detecting duplicate bug reports with software engineering domain knowledge

In previous work by Alipour et al., a methodology was proposed for detecting duplicate bug reports by comparing the textual content of bug reports to subject-specific contextual material, namely lists of software-engineering terms, such as non-functional requirements and architecture keywords. When a bug report contains a word in these word-list contexts, the bug report is considered to be associated with that context and this information tends to improve bug-deduplication methods. In this paper, we propose a method to partially automate the extraction of contextual word lists from software-engineering literature. Evaluating this software-literature context method on real-world bug reports produces useful results that indicate this semi-automated method has the potential to substantially decrease the manual effort used in contextual bug deduplication while suffering only a minor loss in accuracy.

[1]  David Lo,et al.  Improved Duplicate Bug Report Identification , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[2]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[3]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[4]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[5]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2013, Empirical Software Engineering.

[6]  Nicholas A. Kraft,et al.  New features for duplicate bug detection , 2014, MSR 2014.

[7]  Iron Cove Solutions,et al.  Microsoft Office 365 , 2014 .

[8]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[9]  A Straw,et al.  Guide to the Software Engineering Body of Knowledge , 1998 .

[10]  Yasutaka Kamei,et al.  Mining challenge 2012: The Android platform , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[11]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[12]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[13]  Mark L. Murphy The Busy Coder's Guide to Advanced Android Development , 2009 .

[14]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[15]  Andrew W. Appel,et al.  Modern Compiler Implementation in Java , 1997 .

[16]  David Lo,et al.  DupFinder: integrated tool support for duplicate bug report detection , 2014, ASE.

[17]  Eleni Stroulia,et al.  Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs , 2012, 2012 19th Working Conference on Reverse Engineering.

[18]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[19]  Eleni Stroulia,et al.  Detecting duplicate bug reports with software engineering domain knowledge , 2015, SANER.

[20]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[21]  Bonita Sharif,et al.  Improving the accuracy of duplicate bug report detection using textual similarity measures , 2014, MSR 2014.