Bug severity prediction using question-and-answer pairs from Stack Overflow

Abstract Nowadays, bugs have been common in most software systems. For large-scale software projects, developers usually conduct software maintenance tasks by utilizing software artifacts (e.g., bug reports). The severity of bug reports describes the impact of the bugs and determines how quickly it needs to be fixed. Bug triagers often pay close attention to some features such as severity to determine the importance of bug reports and assign them to the correct developers. However, a large number of bug reports submitted every day increase the workload of developers who have to spend more time on fixing bugs. In this paper, we collect question-and-answer pairs from Stack Overflow and use logical regression to predict the severity of bug reports. In detail, we extract all the posts related to bug repositories from Stack Overflow and combine them with bug reports to obtain enhanced versions of bug reports. We achieve severity prediction on three popular open source projects (e,g., Mozilla, Ecplise, and GCC) with Naive Bayesian, k-Nearest Neighbor algorithm (KNN), and Long Short-Term Memory (LSTM). The results of our experiments show that our model is more accurate than the previous studies for predicting the severity. Our approach improves by 23.03%, 21.86%, and 20.59% of the average F-measure for Mozilla, Eclipse, and GCC by comparing with the Naive Bayesian based approach which performs the best among all baseline approaches.

[1]  David Lo,et al.  Accurate developer recommendation for bug resolution , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[2]  Massimiliano Di Penta,et al.  Automatically Classifying Posts Into Question Categories on Stack Overflow , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[3]  David Lo,et al.  Dual analysis for recommending developers to resolve bugs , 2015, J. Softw. Evol. Process..

[4]  Graham Neubig,et al.  Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[5]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[6]  Tao Zhang,et al.  Towards more accurate severity prediction and fixer recommendation of software bugs , 2016, J. Syst. Softw..

[7]  He Jiang,et al.  Towards Effective Bug Triage with Software Data Reduction Techniques , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8]  Tao Zhang,et al.  Bug Report Enrichment with Application of Automated Fixer Recommendation , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[9]  Cheng-Zen Yang,et al.  An Empirical Study on Improving Severity Prediction of Defect Reports Using Feature Selection , 2012, 2012 19th Asia-Pacific Software Engineering Conference.

[10]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[11]  Amjed Tahir,et al.  Can you tell me if it smells?: A study on how developers discuss code smells and anti-patterns in Stack Overflow , 2018, EASE.

[12]  Alex Graves,et al.  Long Short-Term Memory , 2020, Computer Vision.

[13]  Ashish Sureka,et al.  Fit or unfit: analysis and prediction of 'closed questions' on stack overflow , 2013, COSN '13.

[14]  Eleni Stroulia,et al.  A contextual approach towards more accurate duplicate bug report detection and ranking , 2013, Empirical Software Engineering.

[15]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Timothy P. Kurzweg,et al.  Modified logistic regression algorithm for accurate determination of heart beats from noisy passive RFID tag data , 2016, 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[18]  Eleni Stroulia,et al.  Detecting duplicate bug reports with software engineering domain knowledge , 2015, SANER.

[19]  Hui Liu,et al.  Emotion Based Automated Priority Prediction for Bug Reports , 2018, IEEE Access.

[20]  David Lo,et al.  DRONE: Predicting Priority of Reported Bugs by Multi-factor Analysis , 2013, ICSM.

[21]  Tao Zhang,et al.  Towards Semi-automatic Bug Triage and Severity Prediction Based on Topic Model and Multi-feature of Bug Reports , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[22]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[23]  Ayse Tosun Misirli,et al.  A Conceptual Replication on Predicting the Severity of Software Vulnerabilities , 2019, EASE.

[24]  Tao Zhang,et al.  A Literature Review of Research in Bug Resolution: Tasks, Challenges and Future Directions , 2016, Comput. J..

[25]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[26]  Miryung Kim,et al.  Augmenting stack overflow with API usage patterns mined from GitHub , 2018, ESEC/SIGSOFT FSE.

[27]  Ashish Sureka,et al.  Chaff from the wheat: characterization and modeling of deleted questions on stack overflow , 2014, WWW.

[28]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[29]  Cheng-Zen Yang,et al.  Duplication Detection for Software Bug Reports Based on BM25 Term Weighting , 2012, 2012 Conference on Technologies and Applications of Artificial Intelligence.

[30]  Gabriele Bavota,et al.  Mining StackOverflow to turn the IDE into a self-confident programming prompter , 2014, MSR 2014.

[31]  Serge Demeyer,et al.  Comparing Mining Algorithms for Predicting the Severity of a Reported Bug , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[32]  Chanchal Kumar Roy,et al.  Classifying stack overflow posts on API issues , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[33]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[34]  Carolyn B. Seaman,et al.  Defining the decision factors for managing defects: A technical debt perspective , 2012, 2012 Third International Workshop on Managing Technical Debt (MTD).

[35]  David Lo,et al.  Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction , 2012, 2012 19th Working Conference on Reverse Engineering.

[36]  Sarfraz Khurshid,et al.  Understanding the triaging and fixing processes of long lived bugs , 2015, Inf. Softw. Technol..

[37]  David Lo,et al.  Automated prediction of bug report priority using multi-factor analysis , 2014, Empirical Software Engineering.

[38]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[39]  Chanchal Kumar Roy,et al.  Answering questions about unanswered questions of Stack Overflow , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).