BugSum: Deep Context Understanding for Bug Report Summarization

During collaborative software development, bug reports are dynamically maintained and evolved as a part of a software project. For a historical bug report with complicated discussions, an accurate and concise summary can enable stakeholders to reduce the time effort perusing the entire content. Existing studies on bug report summarization, based on whether supervised or unsupervised techniques, are limited due to their lack of consideration of the redundant information and disapproved standpoints among developers' comments. Accordingly, in this paper, we propose a novel unsupervised approach based on deep learning network, called BugSum. Our approach integrates an auto-encoder network for feature extraction with a novel metric (believability) to measure the degree to which a sentence is approved or disapproved within discussions. In addition, a dynamic selection strategy is employed to optimize the comprehensiveness of the auto-generated summary represented by limited words. Extensive experiments show that our approach outperforms 8 comparative approaches over two public datasets. In particular, the probability of adding controversial sentences that are clearly disapproved by other developers during the discussion, into the summary is reduced by up to 69.6%.

[1]  Thomas Zimmermann,et al.  Extracting structural information from bug reports , 2008, MSR '08.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Beate Hampe Superlative verbs : a corpus-based study of semantic redundancy in English verb-particle constructions , 2002 .

[4]  Eric S. Raymond,et al.  The Cathedral and the Bazaar , 2000 .

[5]  Sharvari Govilkar,et al.  Comparative Study of Text Summarization Methods , 2014 .

[6]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2010, IEEE Trans. Software Eng..

[7]  Eirini Kalliamvakou,et al.  Open Source-Style Collaborative Development Practices in Commercial Projects Using GitHub , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[8]  Rong Jin,et al.  Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[9]  He Jiang,et al.  Mining authorship characteristics in bug repositories , 2014, Science China Information Sciences.

[10]  Senthil Mani,et al.  AUSUM: approach for unsupervised bug report summarization , 2012, SIGSOFT FSE.

[11]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[12]  Xiaoran Wang,et al.  Automatically generating natural language descriptions for object-related statement sequences , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[13]  Gail C. Murphy,et al.  Summarizing software artifacts: a case study of bug reports , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[14]  Martin White,et al.  Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[15]  Westley Weimer,et al.  Automatically documenting program changes , 2010, ASE.

[16]  Tiejun Zhao,et al.  Neural Document Summarization by Jointly Learning to Score and Select Sentences , 2018, ACL.

[17]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[18]  Gail C. Murphy,et al.  Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.

[19]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[20]  Premkumar T. Devanbu,et al.  Are deep neural networks the best choice for modeling source code? , 2017, ESEC/SIGSOFT FSE.

[21]  Sang-Won Lee,et al.  On social Web sites , 2010, Inf. Syst..

[22]  Jinghui Cheng,et al.  Analysis and Detection of Information Types of Open Source Software Issue Discussions , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[23]  Colin S. Gordon,et al.  Generating comments from source code with CCGs , 2018, NL4SE@ESEC/SIGSOFT FSE.

[24]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[25]  Yu Qian,et al.  Generating Commit Messages from Diffs using Pointer-Generator Network , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[28]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[29]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[30]  Krzysztof Czarnecki,et al.  Modelling the ‘hurried’ bug report reading process to summarize bug reports , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[31]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[32]  Aishwarya Jadhav,et al.  Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks , 2018, ACL.

[33]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[34]  Premkumar T. Devanbu,et al.  Developer onboarding in GitHub: the role of prior social links and language experience , 2015, ESEC/SIGSOFT FSE.

[35]  Gang Yin,et al.  Where Is the Road for Issue Reports Classification Based on Text Mining? , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[36]  Mario Linares Vásquez,et al.  ChangeScribe: A Tool for Automatically Generating Commit Messages , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[37]  Tim Menzies,et al.  Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[38]  He Jiang,et al.  Summarizing Software Artifacts: A Literature Review , 2016, Journal of Computer Science and Technology.

[39]  He Jiang,et al.  Towards Effective Bug Triage with Software Data Reduction Techniques , 2017, IEEE Transactions on Knowledge and Data Engineering.

[40]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[41]  Dong Liu,et al.  Unsupervised Deep Bug Report Summarization , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[42]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.