Finding Duplicates of Your Yet Unwritten Bug Report

Software projects often use bug-tracking tools to keep track of reported bugs and to provide a communication platform to discuss possible solutions or ways to reproduce failures. The goal is to reduce testing efforts for the development team. However, often, multiple bug reports are committed for the same bug, which, if not recognized as duplicates, can result in work done multiple times by the development team. Duplicate recognition is, in turn, tedious, requiring to examine large amounts of bug reports. Previous work addresses this problem by employing natural-language processing and text similarity measures to automate bug-report duplicate detection. The downside of these techniques is that, to be applicable, they require a reporting user to go through the time-consuming process of describing the problem, just to get informed that the bug is already known. To address this problem, we propose an approach that only uses stack traces and their structure as input to machine-learning algorithms for detecting bug-report duplicates. The key advantage is that stack traces are available without a written bug report. Experiments on bug reports from the Eclipse project show that our approach performs as good as state-of-the-art techniques, but without requiring the whole text corpus of a bug report to be available.

[1]  Bojan Cukic,et al.  Detecting bug duplicate reports through local references , 2011, Promise '11.

[2]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[3]  Ladan Tahvildari,et al.  A Comparative Study of the Performance of IR Models on Duplicate Bug Detection , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[4]  John C. Platt,et al.  Finding Similar Failures Using Callstack Similarity , 2008, SysML.

[5]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[6]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[7]  Gail C. Murphy,et al.  Summarizing software artifacts: a case study of bug reports , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[8]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[9]  Lyndon Hiew,et al.  Assisted Detection of Duplicate Bug Reports , 2006 .

[10]  Guy M. Lohman,et al.  Automatically Identifying Known Software Problems , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[11]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[12]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[13]  Thomas Zimmermann,et al.  Extracting structural information from bug reports , 2008, MSR '08.

[14]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[15]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[16]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Rahul Premraj,et al.  Do stack traces help developers fix bugs? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[18]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[19]  Sheng Ma,et al.  Quickly Finding Known Software Problems via Automated Symptom Matching , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[20]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[21]  Nachiappan Nagappan,et al.  Crash graphs: An aggregated view of multiple crashes to improve crash triage , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[22]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).