Automatic Bug Triage in Software Systems Using Graph Neighborhood Relations for Feature Augmentation

Bug triaging is the process of prioritizing bugs based on their severity, frequency, and risk in order to be assigned to appropriate developers for validation and resolution. This article introduces a graph-based feature augmentation approach for enhancing bug triaging systems using machine learning. A new feature augmentation approach that utilizes graph partitioning based on neighborhood overlap is proposed. Neighborhood overlap is a quite effective approach for discovering relationships in social graphs. Terms of bug summaries are represented as nodes in a graph, which is then partitioned into clusters of terms. Terms in strong clusters are augmented to the original feature vectors of bug summaries based on the similarity between the terms in each cluster and a bug summary. We employed other techniques such as term frequency, term correlation, and topic modeling to identify latent terms and augment them to the original feature vectors of bug summaries. Consequently, we utilized frequency, correlation, and neighborhood overlap techniques to create another feature augmentation approach that enriches the feature vectors of bug summaries to use them for bug triaging. The new modified vectors are used to classify bug reports into different priorities. Bug Triage in this context is to correctly recognize the priority of new bugs. Several classification algorithms are tested using the proposed methods. Experimental results on a data set with Eclipse bug reports extracted from the Bugzilla tracking system have shown that our approach outperformed the existing bug triaging systems including modern techniques that utilize deep learning.

[1]  Philip J. Guo,et al.  "Not my bug!" and other reasons for software bug report reassignments , 2011, CSCW.

[2]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[3]  Chan-Gun Lee,et al.  Applying deep learning based automatic bug triager to industrial projects , 2017, ESEC/SIGSOFT FSE.

[4]  York Hagmayer,et al.  Transitive reasoning distorts induction in causal chains , 2016, Memory & cognition.

[5]  Song Wang,et al.  KSAP: An approach to bug report assignment using KNN search and heterogeneous proximity , 2016, Inf. Softw. Technol..

[6]  Senthil Mani,et al.  DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging , 2018, COMAD/CODS.

[7]  Tao Zhang,et al.  Bug Report Enrichment with Application of Automated Fixer Recommendation , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[8]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[9]  Anh Tuan Nguyen,et al.  Topic-based, time-aware bug assignment , 2014, SOEN.

[10]  Weiqiang Zhang,et al.  Exploring the Influence of Time Factor in Bug Report Prioritization , 2016, SEKE.

[11]  Martin P. Robillard,et al.  Developer Profiles for Recommendation Systems , 2014, Recommendation Systems in Software Engineering.

[12]  Adam A. Porter,et al.  An Empirical Assessment of Machine Learning Approaches for Triaging Reports of a Java Static Analysis Tool , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[13]  Yu Zhou,et al.  Combining Text Mining and Data Mining for Bug Report Classification , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[14]  Seung-won Hwang,et al.  Cost-aware triage ranking algorithms for bug reporting systems , 2015, Knowledge and Information Systems.

[15]  David Lo,et al.  Automated Bug Report Field Reassignment and Refinement Prediction , 2016, IEEE Transactions on Reliability.

[16]  Rong Chen,et al.  Fusion of Multi-RSMOTE With Fuzzy Integral to Classify Bug Reports With an Imbalanced Distribution , 2019, IEEE Transactions on Fuzzy Systems.

[17]  Sandeep Kumar Singh,et al.  Ranking of software developers based on expertise score for bug triaging , 2019, Inf. Softw. Technol..

[18]  Feng Xu,et al.  An Effective Approach for Routing the Bug Reports to the Right Fixers , 2018, Internetware.

[19]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[20]  Tao Zhang,et al.  A Novel Developer Ranking Algorithm for Automatic Bug Triage Using Topic Model and Developer Relations , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[21]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[22]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Bruno Rossi,et al.  Towards an Improvement of Bug Severity Classification , 2014, 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications.

[25]  Jun Yan,et al.  Automatic Bug Triage using Semi-Supervised Text Classification , 2017, SEKE.

[26]  Harald C. Gall,et al.  Collaborative bug triaging using textual similarities and change set analysis , 2013, 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[27]  Xindong Wu,et al.  Topic Modeling over Short Texts by Incorporating Word Embeddings , 2016, PAKDD.

[28]  Scott Sanner,et al.  Improving LDA topic models for microblogs via tweet pooling and automatic labeling , 2013, SIGIR.

[29]  David Lo,et al.  Improving Automated Bug Triaging with Specialized Topic Model , 2017, IEEE Transactions on Software Engineering.

[30]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[31]  Andrew DeOrio,et al.  A Topological Approach to Hardware Bug Triage , 2015, 2015 16th International Workshop on Microprocessor and SOC Test and Verification (MTV).

[32]  Hongrun Wu,et al.  Empirical study on developer factors affecting tossing path length of bug reports , 2018, IET Softw..

[33]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[34]  Zarinah Mohd Kasirun,et al.  A time-based approach to automatic bug report assignment , 2015, J. Syst. Softw..

[35]  Hong Mei,et al.  A survey on bug-report analysis , 2015, Science China Information Sciences.

[36]  Ye Yang,et al.  DRETOM: developer recommendation based on topic models for bug resolution , 2012, PROMISE '12.

[37]  Gang Yin,et al.  Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? , 2016, Inf. Softw. Technol..

[38]  Chunyong Yin,et al.  Maximum entropy model for mobile text classification in cloud computing using improved information gain algorithm , 2016, Multimedia Tools and Applications.

[39]  Dan Yang,et al.  A component recommender for bug reports using Discriminative Probability Latent Semantic Analysis , 2016, Inf. Softw. Technol..

[40]  Hao Hu,et al.  Effective Bug Triage Based on Historical Bug-Fix Information , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[41]  Stuart German,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1988 .

[42]  Collin McMillan,et al.  An Empirical Study of the Effects of Expert Knowledge on Bug Reports , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[43]  Tao Zhang,et al.  Towards Semi-automatic Bug Triage and Severity Prediction Based on Topic Model and Multi-feature of Bug Reports , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[44]  Hui Liu,et al.  CNN-Based Automatic Prioritization of Bug Reports , 2020, IEEE Transactions on Reliability.

[45]  Qing Yang,et al.  Discovering User Interest on Twitter with a Modified Author-Topic Model , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[46]  Hong Cheng,et al.  The dual-sparse topic model: mining focused topics and focused terms in short text , 2014, WWW.

[47]  Meera Sharma,et al.  Reduction of Redundant Rules in Association Rule Mining-Based Bug Assignment , 2018, ArXiv.

[48]  Qiaozhu Mei,et al.  One theme in all views: modeling consensus topics in multiple contexts , 2013, KDD.

[49]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[50]  Haixun Wang,et al.  Short text understanding through lexical-semantic analysis , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[51]  Denys Poshyvanyk,et al.  Who can help me with this change request? , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[52]  Asmita Yadav,et al.  An Information-Theoretic Approach for Bug Triaging , 2018, 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence).

[53]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Bin Li,et al.  DR_PSF: Enhancing Developer Recommendation by Leveraging Personalized Source-Code Files , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[55]  Mirella M. Moro,et al.  Analyzing the Strength of Co-authorship Ties with Neighborhood Overlap , 2015, DEXA.

[56]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[57]  He Jiang,et al.  Towards Effective Bug Triage with Software Data Reduction Techniques , 2017, IEEE Transactions on Knowledge and Data Engineering.

[58]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.