论文信息 - Chaff from the Wheat: Characterizing and Determining Valid Bug Reports

Chaff from the Wheat: Characterizing and Determining Valid Bug Reports

Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is <italic>valid</italic> (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being <italic>invalid</italic> reports. Manually determining <italic>valid</italic> bug report is a difficult and tedious task. Thus, an approach that can automatically analyze the validity of a bug report and determine whether a report is <italic>valid</italic> can help developers prioritize their triaging tasks and avoid wasting time and effort on <italic>invalid</italic> bug reports. In this study, motivated by the above needs, we propose an approach which can determine whether a newly submitted bug report is <italic>valid</italic>. Our approach first extracts 33 features from bug reports. The extracted features are grouped along 5 dimensions, i.e., reporter experience, collaboration network, completeness, readability and text. Based on these features, we use a random forest classifier to identify <italic>valid</italic> bug reports. To evaluate the effectiveness of our approach, we experiment on large-scale datasets containing a total of 560,697 bug reports from five open source projects (i.e., Eclipse, Netbeans, Mozilla, Firefox and Thunderbird). On average, across the five datasets, our approach achieves an F1-score for <italic>valid</italic> bug reports and F1-score for <italic>invalid</italic> ones of 0.74 and 0.67, respectively. Moreover, our approach achieves an average AUC of 0.81. In terms of AUC and F1-scores for <italic>valid</italic> and <italic>invalid</italic> bug reports, our approach statistically significantly outperforms two baselines using features that are proposed by Zanetti et al. <xref ref-type="bibr" rid="ref104">[104]</xref> . We also study the most important features that distinguish <italic>valid</italic> bug reports from <italic>invalid</italic> ones. We find that the textual features of a bug report and reporter's experience are the most important factors to distinguish <italic>valid</italic> bug reports from <italic>invalid</italic> ones.

[1] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2] David Lo,et al. Identifying self-admitted technical debt in open source projects using text mining , 2017, Empirical Software Engineering.

[3] L. Freeman. Centrality in social networks conceptual clarification , 1978 .

[4] David W. Hosmer,et al. Applied Logistic Regression , 1991 .

[5] Aric Hagberg,et al. Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[6] H. B. Mann,et al. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[7] Siau-Cheng Khoo,et al. Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[8] Rudolf Franz Flesch,et al. How to write plain English : a book for lawyers and consumers , 1979 .

[9] Jonathan Anderson. Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[10] Tim Menzies,et al. Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[11] R. Gunning. The Technique of Clear Writing. , 1968 .

[12] Ken-ichi Matsumoto,et al. Studying re-opened bugs in open source software , 2012, Empirical Software Engineering.

[13] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[14] Michel R. V. Chaudron,et al. An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams , 2013, 2013 IEEE International Conference on Software Maintenance.

[15] Rainer Storn,et al. Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[16] Gabriele Bavota,et al. Detecting missing information in bug descriptions , 2017, ESEC/SIGSOFT FSE.

[17] Gina Venolia,et al. The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18] A. Scott,et al. A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[19] Philip S. Yu,et al. Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[20] Yasutaka Kamei,et al. The Impact of Using Regression Models to Build Defect Classifiers , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[21] David Lo,et al. Predicting Crashing Releases of Mobile Applications , 2016, ESEM.

[22] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[23] Charles X. Ling,et al. Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24] David Lo,et al. A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction , 2013, CSMR 2013.

[25] José M. Chaves-González,et al. Differential evolution with Pareto tournament for the multi-objective next release problem , 2015, Appl. Math. Comput..

[26] Tim Menzies,et al. What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[27] Philip J. Guo,et al. Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28] David Lo,et al. Dual analysis for recommending developers to resolve bugs , 2015, J. Softw. Evol. Process..

[29] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .

[30] Ingo Scholtes,et al. Categorizing bugs with social networks: A case study on four open source software communities , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[31] David Lo,et al. Early prediction of merged code changes to prioritize reviewing tasks , 2018, Empirical Software Engineering.

[32] Wouter Joosen,et al. Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[33] G. Harry McLaughlin,et al. SMOG Grading - A New Readability Formula. , 1969 .

[34] Pierre Baldi,et al. Mining the coherence of GNOME bug reports with statistical topic models , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[35] Ken-ichi Matsumoto,et al. Predicting Re-opened Bugs: A Case Study on the Eclipse Project , 2010, 2010 17th Working Conference on Reverse Engineering.

[36] David Lo,et al. Automating Change-Level Self-Admitted Technical Debt Determination , 2019, IEEE Transactions on Software Engineering.

[37] Westley Weimer,et al. Patches as better bug reports , 2006, GPCE '06.

[38] Huan Liu,et al. Feature Engineering for Machine Learning and Data Analytics , 2018 .

[39] R. Flesch. A new readability yardstick. , 1948, The Journal of applied psychology.

[40] Ahmed Tamrawi,et al. Fuzzy set-based automatic bug triaging: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[41] Siau-Cheng Khoo,et al. A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[42] Ahmed E. Hassan,et al. Studying the Impact of Social Structures on Software Quality , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[43] Christian Robottom Reis,et al. An Overview of the Software Engineering Process and Tools in the Mozilla Project , 2002 .

[44] David Lo,et al. Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction , 2012, 2012 19th Working Conference on Reverse Engineering.

[45] Tim Menzies,et al. Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors? , 2016, ArXiv.

[46] Ying Zou,et al. Towards just-in-time suggestions for log changes , 2016, Empirical Software Engineering.

[47] Tao Xie,et al. An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[48] David R. Karger,et al. Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[49] Uirá Kulesza,et al. An Empirical Study of Delays in the Integration of Addressed Issues , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[50] David Lo,et al. Automated prediction of bug report priority using multi-factor analysis , 2014, Empirical Software Engineering.

[51] David Lo,et al. Feature Generation and Engineering for Software Analytics , 2018 .

[52] Bart Baesens,et al. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[53] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[54] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[55] Yu Zhou,et al. Combining Text Mining and Data Mining for Bug Report Classification , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[56] Gerhard Widmer,et al. Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[57] David Lo,et al. Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[58] Ahmed E. Hassan,et al. Studying the impact of social interactions on software quality , 2012, Empirical Software Engineering.

[59] Kate Ehrlich,et al. All-for-one and one-for-all?: a multi-level analysis of communication patterns and individual performance in geographically distributed software development , 2012, CSCW.

[60] R. P. Fishburne,et al. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[61] Stan Matwin,et al. Discriminative parameter learning for Bayesian networks , 2008, ICML '08.

[62] Michael W. Godfrey,et al. Code Review Quality: How Developers See It , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[63] Westley Weimer,et al. Modeling bug report quality , 2007, ASE '07.

[64] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[65] Shane McIntosh,et al. An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[66] David H. Wolpert,et al. An Efficient Method To Estimate Bagging's Generalization Error , 1999, Machine Learning.

[67] Gail C. Murphy,et al. Automatic bug triage using text categorization , 2004, SEKE.

[68] Tim Menzies,et al. What is Wrong with Topic Modeling? (and How to Fix it Using Search-based SE) , 2016, ArXiv.

[69] Yuval Shavitt,et al. A model of Internet topology using k-shell decomposition , 2007, Proceedings of the National Academy of Sciences.

[70] Ulrik Brandes,et al. Centrality Estimation in Large Networks , 2007, Int. J. Bifurc. Chaos.

[71] Zhenchang Xing,et al. What do developers search for on the web? , 2017, Empirical Software Engineering.

[72] Thomas Zimmermann,et al. Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[73] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[74] Krzysztof Czarnecki,et al. Towards improving bug tracking systems with game mechanisms , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[75] Thomas Zimmermann,et al. What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[76] P. Bonacich. Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[77] Mikko Kivelä,et al. Generalizations of the clustering coefficient to weighted complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[78] Rahul Premraj,et al. Do stack traces help developers fix bugs? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[79] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[80] Iulian Neamtiu,et al. Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[81] Ahmed E. Hassan,et al. An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[82] Sven Apel,et al. Evolutionary trends of developer coordination: a network approach , 2015, Empirical Software Engineering.

[83] Thomas Zimmermann,et al. Extracting structural information from bug reports , 2008, MSR '08.

[84] Yi Zhang,et al. Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[85] Emad Shihab,et al. Characterizing and predicting blocking bugs in open source projects , 2014, MSR 2014.

[86] Zhenchang Xing,et al. Who Will Leave the Company?: A Large-Scale Industry Study of Developer Turnover by Mining Monthly Work Report , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[87] Tim Menzies,et al. Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[88] Liang Gong,et al. Predicting bug-fixing time: An empirical study of commercial software projects , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[89] Bart Goethals,et al. Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[90] David Lo,et al. ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[91] David Lo,et al. Duplicate bug report detection with a combination of information retrieval and topic modeling , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[92] M. Coleman,et al. A computer readability formula designed for machine scoring. , 1975 .

[93] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[94] Tao Zhang,et al. Bug Report Enrichment with Application of Automated Fixer Recommendation , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[95] Tian Jiang,et al. Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[96] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[97] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[98] David Lo,et al. Improving Automated Bug Triaging with Specialized Topic Model , 2017, IEEE Transactions on Software Engineering.

[99] D. Williamson,et al. The box plot: a simple visual method to interpret data. , 1989, Annals of internal medicine.

[100] E A Smith,et al. Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[101] Iulian Neamtiu,et al. Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging , 2010, 2010 IEEE International Conference on Software Maintenance.

[102] Thomas Zimmermann,et al. Towards the next generation of bug tracking systems , 2008, 2008 IEEE Symposium on Visual Languages and Human-Centric Computing.

[103] David Lo,et al. What are the characteristics of high-rated apps? A case study on free Android Applications , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[104] Dane Bertram,et al. Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams , 2010, CSCW '10.

[105] Zarinah Mohd Kasirun,et al. Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[106] Gail C. Murphy,et al. Who should fix this bug? , 2006, ICSE.

[107] N. Cliff. Ordinal methods for behavioral data analysis , 1996 .

[108] David Lo,et al. Accurate developer recommendation for bug resolution , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[109] Philip J. Guo,et al. Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).