Chaff from the Wheat: Characterizing and Determining Valid Bug Reports

Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is <italic>valid</italic> (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being <italic>invalid</italic> reports. Manually determining <italic>valid</italic> bug report is a difficult and tedious task. Thus, an approach that can automatically analyze the validity of a bug report and determine whether a report is <italic>valid</italic> can help developers prioritize their triaging tasks and avoid wasting time and effort on <italic>invalid</italic> bug reports. In this study, motivated by the above needs, we propose an approach which can determine whether a newly submitted bug report is <italic>valid</italic>. Our approach first extracts 33 features from bug reports. The extracted features are grouped along 5 dimensions, i.e., reporter experience, collaboration network, completeness, readability and text. Based on these features, we use a random forest classifier to identify <italic>valid</italic> bug reports. To evaluate the effectiveness of our approach, we experiment on large-scale datasets containing a total of 560,697 bug reports from five open source projects (i.e., Eclipse, Netbeans, Mozilla, Firefox and Thunderbird). On average, across the five datasets, our approach achieves an F1-score for <italic>valid</italic> bug reports and F1-score for <italic>invalid</italic> ones of 0.74 and 0.67, respectively. Moreover, our approach achieves an average AUC of 0.81. In terms of AUC and F1-scores for <italic>valid</italic> and <italic>invalid</italic> bug reports, our approach statistically significantly outperforms two baselines using features that are proposed by Zanetti et al. <xref ref-type="bibr" rid="ref104">[104]</xref> . We also study the most important features that distinguish <italic>valid</italic> bug reports from <italic>invalid</italic> ones. We find that the textual features of a bug report and reporter's experience are the most important factors to distinguish <italic>valid</italic> bug reports from <italic>invalid</italic> ones.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  David Lo,et al.  Identifying self-admitted technical debt in open source projects using text mining , 2017, Empirical Software Engineering.

[3]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[4]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[5]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[6]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[7]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[8]  Rudolf Franz Flesch,et al.  How to write plain English : a book for lawyers and consumers , 1979 .

[9]  Jonathan Anderson Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[10]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[11]  R. Gunning The Technique of Clear Writing. , 1968 .

[12]  Ken-ichi Matsumoto,et al.  Studying re-opened bugs in open source software , 2012, Empirical Software Engineering.

[13]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[14]  Michel R. V. Chaudron,et al.  An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams , 2013, 2013 IEEE International Conference on Software Maintenance.

[15]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[16]  Gabriele Bavota,et al.  Detecting missing information in bug descriptions , 2017, ESEC/SIGSOFT FSE.

[17]  Gina Venolia,et al.  The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[19]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[20]  Yasutaka Kamei,et al.  The Impact of Using Regression Models to Build Defect Classifiers , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[21]  David Lo,et al.  Predicting Crashing Releases of Mobile Applications , 2016, ESEM.

[22]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[23]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  David Lo,et al.  A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction , 2013, CSMR 2013.

[25]  José M. Chaves-González,et al.  Differential evolution with Pareto tournament for the multi-objective next release problem , 2015, Appl. Math. Comput..

[26]  Tim Menzies,et al.  What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[27]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  David Lo,et al.  Dual analysis for recommending developers to resolve bugs , 2015, J. Softw. Evol. Process..

[29]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[30]  Ingo Scholtes,et al.  Categorizing bugs with social networks: A case study on four open source software communities , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[31]  David Lo,et al.  Early prediction of merged code changes to prioritize reviewing tasks , 2018, Empirical Software Engineering.

[32]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[33]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[34]  Pierre Baldi,et al.  Mining the coherence of GNOME bug reports with statistical topic models , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[35]  Ken-ichi Matsumoto,et al.  Predicting Re-opened Bugs: A Case Study on the Eclipse Project , 2010, 2010 17th Working Conference on Reverse Engineering.

[36]  David Lo,et al.  Automating Change-Level Self-Admitted Technical Debt Determination , 2019, IEEE Transactions on Software Engineering.

[37]  Westley Weimer,et al.  Patches as better bug reports , 2006, GPCE '06.

[38]  Huan Liu,et al.  Feature Engineering for Machine Learning and Data Analytics , 2018 .

[39]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[40]  Ahmed Tamrawi,et al.  Fuzzy set-based automatic bug triaging: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[41]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[42]  Ahmed E. Hassan,et al.  Studying the Impact of Social Structures on Software Quality , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[43]  Christian Robottom Reis,et al.  An Overview of the Software Engineering Process and Tools in the Mozilla Project , 2002 .

[44]  David Lo,et al.  Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction , 2012, 2012 19th Working Conference on Reverse Engineering.

[45]  Tim Menzies,et al.  Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? , 2016, ArXiv.

[46]  Ying Zou,et al.  Towards just-in-time suggestions for log changes , 2016, Empirical Software Engineering.

[47]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[48]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[49]  Uirá Kulesza,et al.  An Empirical Study of Delays in the Integration of Addressed Issues , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[50]  David Lo,et al.  Automated prediction of bug report priority using multi-factor analysis , 2014, Empirical Software Engineering.

[51]  David Lo,et al.  Feature Generation and Engineering for Software Analytics , 2018 .

[52]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[53]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[54]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[55]  Yu Zhou,et al.  Combining Text Mining and Data Mining for Bug Report Classification , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[56]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[57]  David Lo,et al.  Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[58]  Ahmed E. Hassan,et al.  Studying the impact of social interactions on software quality , 2012, Empirical Software Engineering.

[59]  Kate Ehrlich,et al.  All-for-one and one-for-all?: a multi-level analysis of communication patterns and individual performance in geographically distributed software development , 2012, CSCW.

[60]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[61]  Stan Matwin,et al.  Discriminative parameter learning for Bayesian networks , 2008, ICML '08.

[62]  Michael W. Godfrey,et al.  Code Review Quality: How Developers See It , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[63]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[64]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[65]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[66]  David H. Wolpert,et al.  An Efficient Method To Estimate Bagging's Generalization Error , 1999, Machine Learning.

[67]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[68]  Tim Menzies,et al.  What is Wrong with Topic Modeling? (and How to Fix it Using Search-based SE) , 2016, ArXiv.

[69]  Yuval Shavitt,et al.  A model of Internet topology using k-shell decomposition , 2007, Proceedings of the National Academy of Sciences.

[70]  Ulrik Brandes,et al.  Centrality Estimation in Large Networks , 2007, Int. J. Bifurc. Chaos.

[71]  Zhenchang Xing,et al.  What do developers search for on the web? , 2017, Empirical Software Engineering.

[72]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[73]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[74]  Krzysztof Czarnecki,et al.  Towards improving bug tracking systems with game mechanisms , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[75]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[76]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[77]  Mikko Kivelä,et al.  Generalizations of the clustering coefficient to weighted complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[78]  Rahul Premraj,et al.  Do stack traces help developers fix bugs? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[79]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[80]  Iulian Neamtiu,et al.  Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[81]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[82]  Sven Apel,et al.  Evolutionary trends of developer coordination: a network approach , 2015, Empirical Software Engineering.

[83]  Thomas Zimmermann,et al.  Extracting structural information from bug reports , 2008, MSR '08.

[84]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[85]  Emad Shihab,et al.  Characterizing and predicting blocking bugs in open source projects , 2014, MSR 2014.

[86]  Zhenchang Xing,et al.  Who Will Leave the Company?: A Large-Scale Industry Study of Developer Turnover by Mining Monthly Work Report , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[87]  Tim Menzies,et al.  Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[88]  Liang Gong,et al.  Predicting bug-fixing time: An empirical study of commercial software projects , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[89]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[90]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[91]  David Lo,et al.  Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[92]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[93]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[94]  Tao Zhang,et al.  Bug Report Enrichment with Application of Automated Fixer Recommendation , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[95]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[96]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[97]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[98]  David Lo,et al.  Improving Automated Bug Triaging with Specialized Topic Model , 2017, IEEE Transactions on Software Engineering.

[99]  D. Williamson,et al.  The box plot: a simple visual method to interpret data. , 1989, Annals of internal medicine.

[100]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[101]  Iulian Neamtiu,et al.  Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging , 2010, 2010 IEEE International Conference on Software Maintenance.

[102]  Thomas Zimmermann,et al.  Towards the next generation of bug tracking systems , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[103]  David Lo,et al.  What are the characteristics of high-rated apps? A case study on free Android Applications , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[104]  Dane Bertram,et al.  Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams , 2010, CSCW '10.

[105]  Zarinah Mohd Kasirun,et al.  Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[106]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[107]  N. Cliff Ordinal methods for behavioral data analysis , 1996 .

[108]  David Lo,et al.  Accurate developer recommendation for bug resolution , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[109]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).