Improving bug management using correlations in crash reports

Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically.

[1]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[2]  Sheng Ma,et al.  Automated Problem Determination Using Call-Stack Matching , 2005, Journal of Network and Systems Management.

[3]  Foutse Khomh,et al.  An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mozilla Firefox , 2011, 2011 18th Working Conference on Reverse Engineering.

[4]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[5]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[6]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[7]  David R. Barstow,et al.  Proceedings of the 25th International Conference on Software Engineering , 1978, ICSE.

[8]  Sooyong Park,et al.  Which Crashes Should I Fix First?: Predicting Top Crashes at an Early Stage to Prioritize Debugging Efforts , 2011, IEEE Transactions on Software Engineering.

[9]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[10]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[11]  Ying Zou,et al.  Visualizing the Results of Field Testing , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[12]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[13]  R. Yin Case Study Research: Design and Methods , 1984 .

[14]  Galen C. Hunt,et al.  Debugging in the (very) large: ten years of implementation and experience , 2009, SOSP '09.

[15]  Shaohua Wang,et al.  Improving bug localization using correlations in crash reports , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[16]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  M. H. Margahny,et al.  FAST ALGORITHM FOR MINING ASSOCIATION RULES , 2014 .

[19]  Thomas Zimmermann,et al.  Extracting structural information from bug reports , 2008, MSR '08.

[20]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[21]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986 .

[22]  Mary Lou Soffa,et al.  Path-based fault correlations , 2010, FSE '10.

[23]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[24]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[25]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[26]  K RajamaniSriram,et al.  From symptom to cause , 2003 .

[27]  Foutse Khomh,et al.  Classifying field crash reports for fixing bugs: A case study of Mozilla Firefox , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[28]  Nachiappan Nagappan,et al.  Crash graphs: An aggregated view of multiple crashes to improve crash triage , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[29]  Mayur Naik,et al.  From symptom to cause: localizing errors in counterexample traces , 2003, POPL '03.

[30]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[31]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[32]  Rahul Premraj,et al.  Do stack traces help developers fix bugs? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[33]  Mayur Naik,et al.  From symptom to cause: localizing errors in counterexample traces , 2003, POPL '03.

[34]  Latifur Khan,et al.  Software Fault Localization Using N-gram Analysis , 2008, WASA.

[35]  W. Eric Wong,et al.  Software Fault Localization , 2010, Encyclopedia of Software Engineering.