Can We Detect Bug Report Duplication with Unfinished Bug Reports?

It is useful if a bug tracking system can detect bug report duplication with unfinished bug reports. To investigate the feasibility, we study relations between accuracy of duplicate bug report detection using features extracted from textual information in bug reports and the number of words in bug reports in this paper. The results show that increasing the number of words to be used in duplicate detection over a certain number does not affect the accuracy very much. The results also indicate that we had better use about 100 and 80 words in Eclipse and OpenOffice, respectively, in the detection because we may have many wrong candidates of duplication if we use words of more than the numbers. We thus think that detecting bug duplication in writing a new bug report has potential of giving duplicate bug report candidates.

[1]  Tao Xie,et al.  JDF: detecting duplicate bug reports in Jazz , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[2]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[3]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2013, Empirical Software Engineering.

[4]  Daniel Lucrédio,et al.  The bug report duplication problem: an exploratory study , 2011, Software Quality Journal.

[5]  Cheng-Zen Yang,et al.  Duplication Detection for Software Bug Reports Based on BM25 Term Weighting , 2012, 2012 Conference on Technologies and Applications of Artificial Intelligence.

[6]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[7]  Daniel Lucrédio,et al.  An Initial Study on the Bug Report Duplication Problem , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[8]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[9]  Bonita Sharif,et al.  Improving the accuracy of duplicate bug report detection using textual similarity measures , 2014, MSR 2014.

[10]  Bonita Sharif,et al.  Generating duplicate bug datasets , 2014, MSR 2014.

[11]  David Lo,et al.  DupFinder: integrated tool support for duplicate bug report detection , 2014, ASE.

[12]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[13]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[14]  Daniel B. Horn,et al.  Patterns of entry and correction in large vocabulary continuous speech recognition systems , 1999, CHI '99.

[15]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).