Assessing the quality of the steps to reproduce in bug reports

A major problem with user-written bug reports, indicated by developers and documented by researchers, is the (lack of high) quality of the reported steps to reproduce the bugs. Low-quality steps to reproduce lead to excessive manual effort spent on bug triage and resolution. This paper proposes Euler, an approach that automatically identifies and assesses the quality of the steps to reproduce in a bug report, providing feedback to the reporters, which they can use to improve the bug report. The feedback provided by Euler was assessed by external evaluators and the results indicate that Euler correctly identified 98% of the existing steps to reproduce and 58% of the missing ones, while 73% of its quality annotations are correct.

[1]  Thomas Zimmermann,et al.  Improving bug tracking systems , 2009, 2009 31st International Conference on Software Engineering - Companion Volume.

[2]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[3]  Michele Lanza,et al.  What Makes a Satisficing Bug Report? , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[4]  Gabriele Bavota,et al.  Detecting missing information in bug descriptions , 2017, ESEC/SIGSOFT FSE.

[5]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[6]  Christopher Vendome,et al.  CrashScope: A Practical Tool for Automated Testing of Android Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[7]  Bogdan Dit,et al.  Measuring the Semantic Similarity of Comments in Bug Reports , 2008 .

[8]  Brad A. Myers,et al.  A Linguistic Analysis of How People Describe Software Problems , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[9]  Yang Liu,et al.  ReCDroid: Automatically Reproducing Android Application Crashes from Bug Reports , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[10]  Ingo Scholtes,et al.  Categorizing bugs with social networks: A case study on four open source software communities , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[11]  Peter M. Chisnall,et al.  Questionnaire Design, Interviewing and Attitude Measurement , 1993 .

[12]  Yue Zhang,et al.  NCRF++: An Open-source Neural Sequence Labeling Toolkit , 2018, ACL.

[13]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[14]  Mario Linares Vásquez,et al.  Auto-completing bug reports for Android applications , 2015, ESEC/SIGSOFT FSE.

[15]  Hasan Sözer,et al.  Reproducing failures based on semiformal failure scenario descriptions , 2017, Software Quality Journal.

[16]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Sarfraz Khurshid,et al.  Automated Generation of Oracles for Testing User-Interaction Features of Mobile Apps , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[18]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[19]  Christopher Vendome,et al.  Automatically Discovering, Reporting and Reproducing Android Application Crashes , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[20]  Vikram S. Adve,et al.  An empirical study of reported bugs in server software with implications for automated bug diagnosis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[21]  Alessandro Orso,et al.  Automatically translating bug reports into test cases for mobile apps , 2018, ISSTA.

[22]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[23]  Bhavana S. Pansare,et al.  Information Needs in Bug Reports : Improving Cooperation between Developers and Users , 2015 .

[24]  Marc Roper,et al.  What's in a bug report? , 2014, ESEM '14.

[25]  Ali Mesbah,et al.  Works for me! characterizing non-reproducible bug reports , 2014, MSR 2014.

[26]  Pierre Baldi,et al.  Mining the coherence of GNOME bug reports with statistical topic models , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[27]  Yue Zhang,et al.  Design Challenges and Misconceptions in Neural Sequence Labeling , 2018, COLING.

[28]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[29]  Tao Zhang,et al.  Bug Report Enrichment with Application of Automated Fixer Recommendation , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[30]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[31]  Ashish Sureka,et al.  Detecting Duplicate Bug Report Using Character N-Gram-Based Features , 2010, 2010 Asia Pacific Software Engineering Conference.

[32]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[33]  David Yarowsky,et al.  Techniques in Speech Acoustics , 1999, Computational Linguistics.

[34]  Mika Mäntylä,et al.  Survey Reproduction of Defect Reporting in Industrial Software Development , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[35]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[36]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[37]  Doo-Hwan Bae,et al.  Automated model-based Android GUI testing using multi-level GUI comparison criteria , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).