Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification

While many prior works used text mining for automating different tasks related to software bug reports, few works considered the security aspects. This paper is focused on automated classification of software bug reports to security and not-security related, using both supervised and unsupervised approaches. For both approaches, three types of feature vectors are used. For supervised learning, we experiment with multiple classifiers and training sets with different sizes. Furthermore, we propose a novel unsupervised approach based on anomaly detection. The evaluation is based on three NASA datasets. The results showed that supervised classification is affected more by the learning algorithms than by feature vectors and training only on 25% of the data provides as good results as training on 90% of the data. The supervised learning slightly outperforms the unsupervised learning, at the expense of labeling the training set. In general, datasets with more security information lead to better performance.

[1]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[2]  Lucas Layman,et al.  Topic Modeling of NASA Space System Problem Reports: Research in Practice , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[3]  Miles McQueen,et al.  Estimating Software Vulnerabilities: A Case Study Based on the Misclassification of Bugs in MySQL Server , 2013, 2013 International Conference on Availability, Reliability and Security.

[4]  Gail C. Murphy,et al.  Automatic categorization of bug reports using latent Dirichlet allocation , 2012, ISEC.

[5]  Tim Menzies,et al.  Problems with Precision , 2007 .

[6]  Anuja Arora,et al.  A bug Mining tool to identify and analyze security bugs using Naive Bayes and TF-IDF , 2014, 2014 International Conference on Reliability Optimization and Information Technology (ICROIT).

[7]  Matthew Smith,et al.  VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits , 2015, CCS.

[8]  Milos Manic,et al.  Vulnerability identification and classification via text mining bug databases , 2014, IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society.

[9]  Malik Yousef,et al.  One-class document classification via Neural Networks , 2007, Neurocomputing.

[10]  Milos Manic,et al.  Mining Bug Databases for Unidentified Software Vulnerabilities , 2012, 2012 5th International Conference on Human System Interactions.

[11]  Serge Demeyer,et al.  Comparing Mining Algorithms for Predicting the Severity of a Reported Bug , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[12]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[13]  Katerina Goseva-Popstojanova,et al.  On the capability of static code analysis to detect security vulnerabilities , 2015, Inf. Softw. Technol..

[14]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[15]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[16]  Hosny M. Ibrahim,et al.  Predicting Bug Category Based on Analysis of Software Repositories , .

[17]  Sandeep K. Singh,et al.  An Automated approach for Bug Categorization using Fuzzy Logic , 2015, ISEC.

[18]  Katerina Goseva-Popstojanova,et al.  Experience Report: Security Vulnerability Profiles of Mission Critical Software: Empirical Analysis of Security Related Bug Reports , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[19]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[20]  Wouter Joosen,et al.  Software vulnerability prediction using text analysis techniques , 2012, MetriSec '12.

[21]  Ju An Wang,et al.  Vulnerability categorization using Bayesian networks , 2010, CSIIRW '10.

[22]  Bashar Nuseibeh,et al.  Text Filtering and Ranking for Security Bug Report Prediction , 2019, IEEE Transactions on Software Engineering.