Automatically Identifying Security Bug Reports via Multitype Features Analysis

Bug-tracking systems are widely used by software developers to manage bug reports. Since it is time-consuming and costly to fix all the bugs, developers usually pay more attention to the bugs with higher impact, such as security bugs (i.e., vulnerabilities) which can be exploited by malicious users to launch attacks and cause great damages. However, manually identifying security bug reports from millions of reports in bug-tracking systems is difficult and error-prone. Furthermore, existing automated identification approaches to security bug reports often incur many false negatives, causing a hidden danger to the computer system. To address this important problem, we present an automatic security bug reports identification model via multitype features analysis, dubbed Security Bug Report Identifier (SBRer). Specifically, we make use of multiple kinds of information contained in a bug report, including meta features and textual features, to automatically identify the security bug reports via natural language processing and machine learning techniques. The experimental results show that SBRer with imbalanced data processing can successfully identify the security bug reports with a much higher precision of 99.4% and recall of 79.9% compared to existing work.

[1]  Jun Zhao,et al.  How to Generate a Good Word Embedding , 2015, IEEE Intelligent Systems.

[2]  Hoan Anh Nguyen,et al.  Detection of recurring software vulnerabilities , 2010, ASE.

[3]  Ken-ichi Matsumoto,et al.  A Dataset of High Impact Bugs: Manually-Classified Issue Reports , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[4]  Herbert Bos,et al.  Dowsing for Overflows: A Guided Fuzzer to Find Buffer Boundary Violations , 2013, USENIX Security Symposium.

[5]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[6]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[7]  Xinli Yang,et al.  Automated Identification of High Impact Bug Reports Leveraging Imbalanced Learning Strategies , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  He Jiang,et al.  Mining authorship characteristics in bug repositories , 2014, Science China Information Sciences.

[10]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[11]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[12]  David Brumley,et al.  ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions , 2012, 2012 IEEE Symposium on Security and Privacy.

[13]  Sunghun Kim,et al.  How long did it take to fix bugs? , 2006, MSR '06.

[14]  Yaqin Zhou,et al.  Automated identification of security issues from commit messages and bug reports , 2017, ESEC/SIGSOFT FSE.

[15]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[16]  Bojan Cukic,et al.  Detecting bug duplicate reports through local references , 2011, Promise '11.

[17]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[18]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[19]  Serge Demeyer,et al.  Comparing Mining Algorithms for Predicting the Severity of a Reported Bug , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[20]  David Lo,et al.  Automated prediction of bug report priority using multi-factor analysis , 2014, Empirical Software Engineering.

[21]  Guofei Gu,et al.  TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[22]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[23]  Uirá Kulesza,et al.  An empirical study of the integration time of fixed issues , 2017, Empirical Software Engineering.

[24]  Andrew Meneely,et al.  Do Bugs Foreshadow Vulnerabilities? A Study of the Chromium Project , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[25]  Konrad Rieck,et al.  Modeling and Discovering Vulnerabilities with Code Property Graphs , 2014, 2014 IEEE Symposium on Security and Privacy.

[26]  Milos Manic,et al.  Vulnerability identification and classification via text mining bug databases , 2014, IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society.

[27]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[28]  Anuja Arora,et al.  A bug Mining tool to identify and analyze security bugs using Naive Bayes and TF-IDF , 2014, 2014 International Conference on Reliability Optimization and Information Technology (ICROIT).

[29]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[30]  Heejo Lee,et al.  VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery , 2017, 2017 IEEE Symposium on Security and Privacy (SP).