Identifying and predicting key features to support bug reporting

Bug reports are the primary means through which developers triage and fix bugs. To achieve this effectively, bug reports need to clearly describe those features that are important for the developers. However, previous studies have found that reporters do not always provide such features. Therefore, we first perform an exploratory study to identify the key features that reporters frequently miss in their initial bug report submissions. Then, we propose an approach that predicts whether reporters should provide certain key features to ensure a good bug report. A case study of the bug reports for Camel, Derby, Wicket, Firefox, and Thunderbird projects shows that Steps to Reproduce, Test Case, Code Example, Stack Trace, and Expected Behavior are the additional features that reporters most often omit from their initial bug report submissions. We also find that these features significantly affect the bug‐fixing process. On the basis of our findings, we build and evaluate classification models using four different text‐classification techniques to predict key features by leveraging historical bug‐fixing knowledge. The evaluation results show that our models can effectively predict the key features. Our comparative study of different text‐classification techniques shows that naïve Bayes multinomial (NBM) outperforms other techniques. Our findings can benefit reporters to improve the contents of bug reports.

[1]  Andreas Zeller,et al.  Where Should We Fix This Bug? A Two-Phase Recommendation Model , 2013, IEEE Transactions on Software Engineering.

[2]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[3]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[4]  Marc Roper,et al.  What's in a bug report? , 2014, ESEM '14.

[5]  Audris Mockus,et al.  High-impact defects: a study of breakage and surprise defects , 2011, ESEC/FSE '11.

[6]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[7]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[8]  Akito Monden,et al.  The Effects of Over and Under Sampling on Fault-prone Module Detection , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[9]  Thomas Zimmermann,et al.  Extracting structural information from bug reports , 2008, MSR '08.

[10]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[11]  Gerardo Canfora,et al.  Fine grained indexing of software repositories to support impact analysis , 2006, MSR '06.

[12]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[13]  Collin McMillan,et al.  An empirical study on how expert knowledge affects bug reports , 2016, J. Softw. Evol. Process..

[14]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[15]  Ahmed E. Hassan,et al.  An empirical study of dormant bugs , 2014, MSR 2014.

[16]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Tian Jiang,et al.  Discovering, reporting, and fixing performance bugs , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[18]  Audris Mockus,et al.  Who Will Stay in the FLOSS Community? Modeling Participant’s Initial Behavior , 2015, IEEE Transactions on Software Engineering.

[19]  Ahmed E. Hassan,et al.  Security versus performance bugs: a case study on Firefox , 2011, MSR '11.

[20]  Tao Zhang,et al.  Bug Report Enrichment with Application of Automated Fixer Recommendation , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[21]  Ken-ichi Matsumoto,et al.  A Dataset of High Impact Bugs: Manually-Classified Issue Reports , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[22]  Shane McIntosh,et al.  A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[23]  Thomas Zimmermann,et al.  Information needs in bug reports: improving cooperation between developers and users , 2010, CSCW '10.

[24]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[25]  Ali Mesbah,et al.  Works for me! characterizing non-reproducible bug reports , 2014, MSR 2014.

[26]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[27]  David Lo,et al.  Automated Configuration Bug Report Prediction Using Text Mining , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[28]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[29]  Dewan Md Farid,et al.  An adaptive ensemble classifier for mining complex noisy instances in data streams , 2014, 2014 International Conference on Informatics, Electronics & Vision (ICIEV).

[30]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[31]  Brad A. Myers,et al.  A Linguistic Analysis of How People Describe Software Problems , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[32]  Marco Tulio Valente,et al.  An Empirical Study on Recommendations of Similar Bugs , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[33]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[34]  Jeffrey C. Carver,et al.  Characterizing Software Architecture Changes: An Initial Study , 2007, ESEM 2007.

[35]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[36]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[37]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[38]  Thomas Zimmermann,et al.  Towards the next generation of bug tracking systems , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[39]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[40]  Emad Shihab,et al.  Characterizing and predicting blocking bugs in open source projects , 2014, MSR 2014.

[41]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[42]  Xinli Yang,et al.  High-Impact Bug Report Identification with Imbalanced Learning Strategies , 2017, Journal of Computer Science and Technology.

[43]  Michele Lanza,et al.  What Makes a Satisficing Bug Report? , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[44]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[45]  Ken-ichi Matsumoto,et al.  The impact of bug management patterns on bug fixing: A case study of Eclipse projects , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[46]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[47]  Hajimu Iida,et al.  Understanding Key Features of High-Impact Bug Reports , 2017, 2017 8th International Workshop on Empirical Software Engineering in Practice (IWESEP).

[48]  Gabriele Bavota,et al.  Detecting missing information in bug descriptions , 2017, ESEC/SIGSOFT FSE.

[49]  Gina Venolia,et al.  The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[50]  John C. Grundy,et al.  Reporting usability defects: do reporters report what software developers need? , 2016, EASE.

[51]  Masao Ohira,et al.  A Pilot Study of Diversity in High Impact Bugs , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[52]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..