Which Packages Would be Affected by This Bug Report?

A large project (e.g., Ubuntu) usually contains a large number of software packages. Sometimes the same bug report in such project would affect multiple packages, and developers of different packages need to collaborate with one another to fix the bug. Unfortunately, the total number of packages involved in a project like Ubuntu is relatively large, which makes it time-consuming to manually identify packages that are affected by a bug report. In this paper, we propose an approach named PkgRec that consists of 2 components: a name matching component and an ensemble learning component. In the name matching component, we assign a confidence score for a package if it is mentioned by a bug report. In the ensemble learning component, we divide the training dataset into n subsets and build a sub-classifier on each subset. Then we automatically determine an appropriate weight for each sub-classifier and combine them to predict the confidence score of a package being affected by a new bug report. Finally, PkgRec combines the name matching component and the ensemble learning component to assign a final confidence score to each potential package. A list of top-k packages with the highest confidence scores would then be recommended. We evaluate PkgRec on 3 datasets including Ubuntu, OpenStack, and GNOME with a total number of 42,094 bug reports. We show that PkgRec could achieve recall@5 and recall@10 scores of 0.511-0.737, and 0.614-0.785, respectively. We also compare PkgRec with other state-of-art approaches, namely LDA-KL and MLkNN. The experiment results show that PkgRec on average improves recall@5 and recall@10 scores of LDA-KL by 47% and 31%, and MLkNN by 52% and 37%, respectively.

[1]  Gail C. Murphy,et al.  Automatic categorization of bug reports using latent Dirichlet allocation , 2012, ISEC.

[2]  David Lo,et al.  Bug Characteristics in Blockchain Systems: A Large-Scale Empirical Study , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[3]  Iulian Neamtiu,et al.  Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging , 2010, 2010 IEEE International Conference on Software Maintenance.

[4]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[5]  Ahmed E. Hassan,et al.  On the relationship between comment update practices and Software Bugs , 2012, J. Syst. Softw..

[6]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[7]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[8]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[9]  David Lo,et al.  Dual analysis for recommending developers to resolve bugs , 2015, J. Softw. Evol. Process..

[10]  David Lo,et al.  Automated prediction of bug report priority using multi-factor analysis , 2014, Empirical Software Engineering.

[11]  N. Cliff Ordinal methods for behavioral data analysis , 1996 .

[12]  Ye Yang,et al.  DREX: Developer Recommendation with K-Nearest-Neighbor Search and Expertise Ranking , 2011, 2011 18th Asia-Pacific Software Engineering Conference.

[13]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[14]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  David Lo,et al.  Cross-language bug localization , 2014, ICPC 2014.

[17]  Dane Bertram,et al.  Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams , 2010, CSCW '10.

[18]  Gail C. Murphy,et al.  Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.

[19]  Zarinah Mohd Kasirun,et al.  Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[20]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[21]  Ming Wen,et al.  Locus: Locating bugs from software changes , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Xinli Yang,et al.  High-Impact Bug Report Identification with Imbalanced Learning Strategies , 2017, Journal of Computer Science and Technology.

[23]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[24]  David Lo,et al.  Practitioners' expectations on automated fault localization , 2016, ISSTA.

[25]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[26]  David Lo,et al.  Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[27]  David Lo,et al.  Automatic Defect Categorization Based on Fault Triggering Conditions , 2014, 2014 19th International Conference on Engineering of Complex Computer Systems.

[28]  Ahmed Tamrawi,et al.  Fuzzy set and cache-based approach for bug triaging , 2011, ESEC/FSE '11.

[29]  Ye Yang,et al.  DRETOM: developer recommendation based on topic models for bug resolution , 2012, PROMISE '12.

[30]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[31]  H. Abdi The Bonferonni and Šidák Corrections for Multiple Comparisons , 2006 .

[32]  David Lo,et al.  Tag recommendation in software information sites , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[33]  Dirk Riehle,et al.  A Model of the Commit Size Distribution of Open Source , 2014, SOFSEM.

[34]  David Lo,et al.  Inferring Links between Concerns and Methods with Multi-abstraction Vector Space Model , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[35]  Lingfeng Bao,et al.  “Automated Debugging Considered Harmful” Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[36]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[37]  Tao Zhang,et al.  Bug Report Enrichment with Application of Automated Fixer Recommendation , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[38]  Mario Linares Vásquez,et al.  Triaging incoming change requests: Bug or commit history, or code authorship? , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[39]  Bernd Brügge,et al.  Bug report assignee recommendation using activity profiles , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[40]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[41]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[42]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[43]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[44]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[45]  David Lo,et al.  Improving Automated Bug Triaging with Specialized Topic Model , 2017, IEEE Transactions on Software Engineering.

[46]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[47]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[48]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[49]  Denys Poshyvanyk,et al.  Journal of Software Maintenance and Evolution: Research and Practice Assigning Change Requests to Software Developers , 2022 .

[50]  Razvan C. Bunescu,et al.  Learning to rank relevant files for bug reports using domain knowledge , 2014, SIGSOFT FSE.

[51]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[52]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[53]  Ingo Scholtes,et al.  Categorizing bugs with social networks: A case study on four open source software communities , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[54]  Hung Viet Nguyen,et al.  A topic-based approach for narrowing the search space of buggy files from a bug report , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[55]  Ahmed Tamrawi,et al.  Fuzzy set approach for automatic tagging in evolving software , 2010, 2010 IEEE International Conference on Software Maintenance.

[56]  Xinli Yang,et al.  Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).