Detecting duplicate bug reports with software engineering domain knowledge

In previous work by Alipour et al., a methodology was proposed for detecting duplicate bug reports by comparing the textual content of bug reports to subject-specific contextual material, namely lists of software-engineering terms, such as non-functional requirements and architecture keywords. When a bug report contains a word in these word-list contexts, the bug report is considered to be associated with that context and this information tends to improve bug-deduplication methods. In this paper, we propose a method to partially automate the extraction of contextual word lists from software-engineering literature. Evaluating this software-literature context method on real-world bug reports produces useful results that indicate this semi-automated method has the potential to substantially decrease the manual effort used in contextual bug deduplication while suffering only a minor loss in accuracy.

[1]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[2]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[3]  Eleni Stroulia,et al.  Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs , 2012, 2012 19th Working Conference on Reverse Engineering.

[4]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[5]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[6]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[7]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2013, Empirical Software Engineering.

[8]  Yasutaka Kamei,et al.  Mining challenge 2012: The Android platform , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[9]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[10]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[11]  Mark L. Murphy The Busy Coder's Guide to Advanced Android Development , 2009 .

[12]  Bonita Sharif,et al.  Improving the accuracy of duplicate bug report detection using textual similarity measures , 2014, MSR 2014.

[13]  Nicholas A. Kraft,et al.  New features for duplicate bug detection , 2014, MSR 2014.

[14]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.