Deep Learning Based Valid Bug Reports Determination and Explanation

Bug reports are widely used by developers to fix bugs. Due to the lack of experience, reporters may submit numerous invalid bug reports. Manually determining valid bug reports is a laborious task. Automatically identifying valid bug reports can save time and effort for bug analysis. In this paper, we propose a deep learning-based approach to determine and explain valid bug reports using only textual information i.e., summaries and descriptions of bug reports. Convolutional neural network (CNN) is applied to capture their contextual and semantic features. Moreover, by analyzing the spatial structure of CNN, we backtrack the trained CNN model to get phrases that can explain valid bug reports determination. After inspecting the phrases manually, we summarize some valid bug report patterns. We evaluate our approach on five large-scale open-source projects containing a total of 540491 bug reports. On average, across the five projects, our approach achieves 0.85, 0.80, 0.69 and improves the state-of-the-art approach by 8.97%, 9.59%, 9.52% in terms of AUC, F1-score for valid bug reports, and F1-score for invalid bug reports, respectively. From the summarized patterns, we can find that determining valid bug reports is mainly due to three categories of patterns: Attachment, Environment, and Reproduce.

[1]  Tao Zhang,et al.  Bug Report Enrichment with Application of Automated Fixer Recommendation , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[2]  Lerina Aversano,et al.  Bug Report Quality Evaluation Considering the Effect of Submitter Reputation , 2016, ICSOFT-EA.

[3]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[4]  Gail C. Murphy,et al.  Coping with an open bug repository , 2005, eclipse '05.

[5]  Dan Yang,et al.  A component recommender for bug reports using Discriminative Probability Latent Semantic Analysis , 2016, Inf. Softw. Technol..

[6]  Sunghun Kim,et al.  Toward an understanding of bug fix patterns , 2009, Empirical Software Engineering.

[7]  David Lo,et al.  Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction , 2012, 2012 19th Working Conference on Reverse Engineering.

[8]  Pierre Baldi,et al.  Mining the coherence of GNOME bug reports with statistical topic models , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[9]  Avanti Shrikumar,et al.  Not Just A Black Box : Interpretable Deep Learning by Propagating Activation Differences , 2016 .

[10]  Ingo Scholtes,et al.  Categorizing bugs with social networks: A case study on four open source software communities , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[11]  Yann-Gaël Guéhéneuc,et al.  Improving Bug Location Using Binary Class Relationships , 2012, 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation.

[12]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[13]  Krzysztof Czarnecki,et al.  Towards improving bug tracking systems with game mechanisms , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[14]  Ahmed Tamrawi,et al.  Fuzzy set-based automatic bug triaging: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[15]  K. M. Annervaz,et al.  Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[16]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[17]  Westley Weimer,et al.  Patches as better bug reports , 2006, GPCE '06.

[18]  Thomas Zimmermann,et al.  Quality of bug reports in Eclipse , 2007, eclipse '07.

[19]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[20]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[21]  Iulian Neamtiu,et al.  Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[22]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Andreas Zeller,et al.  Where is the bug and how is it fixed? an experiment with practitioners , 2017, ESEC/SIGSOFT FSE.

[24]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[25]  Cherry Oo,et al.  Spectrum-Based Bug Localization of Real-World Java Bugs , 2019, ICSE 2019.

[26]  Alexandre Denis,et al.  Do Convolutional Networks need to be Deep for Text Classification ? , 2017, AAAI Workshops.

[27]  Y. Raghu Reddy,et al.  Towards Word Embeddings for Improved Duplicate Bug Report Retrieval in Software Repositories , 2018, ICTIR.

[28]  Daniel Lucrédio,et al.  An Initial Study on the Bug Report Duplication Problem , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[29]  David Lo,et al.  Chaff from the Wheat: Characterizing and Determining Valid Bug Reports , 2020, IEEE Transactions on Software Engineering.

[30]  Jacques Klein,et al.  D&C: A Divide-and-Conquer Approach to IR-based Bug Localization , 2019, ArXiv.

[31]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[32]  Xingjun Zhang,et al.  Comparing learning to rank techniques in hybrid bug localization , 2018, Appl. Soft Comput..

[33]  Massimiliano Di Penta,et al.  Assessing the quality of the steps to reproduce in bug reports , 2019, ESEC/SIGSOFT FSE.

[34]  Zarinah Mohd Kasirun,et al.  Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[35]  Marcelo de Almeida Maia,et al.  Discovering common bug‐fix patterns: A large‐scale observational study , 2019, J. Softw. Evol. Process..

[36]  Nicolás Serrano,et al.  Bugzilla, ITracker, and Other Bug Trackers , 2005, IEEE Softw..