Text Filtering and Ranking for Security Bug Report Prediction

Security bug reports can describe security critical vulnerabilities in software products. Bug tracking systems may contain thousands of bug reports, where relatively few of them are security related. Therefore finding unlabelled security bugs among them can be challenging. To help security engineers identify these reports quickly and accurately, text-based prediction models have been proposed. These can often mislabel security bug reports due to a number of reasons such as class imbalance, where the ratio of non-security to security bug reports is very high. More critically, we have observed that the presence of security related keywords in both security and non-security bug reports can lead to the mislabelling of security bug reports. This paper proposes FARSEC, a framework for filtering and ranking bug reports for reducing the presence of security related keywords. Before building prediction models, our framework identifies and removes non-security bug reports with security related keywords. We demonstrate that FARSEC improves the performance of text-based prediction models for security bug reports in 90 percent of cases. Specifically, we evaluate it with 45,940 bug reports from Chromium and four Apache projects. With our framework, we mitigate the class imbalance issue and reduce the number of mislabelled security bug reports by 38 percent.

[1]  Wasif Afzal,et al.  Using Faults-Slip-Through Metric as a Predictor of Fault-Proneness , 2010, 2010 Asia Pacific Software Engineering Conference.

[2]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[3]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[4]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[5]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[6]  Tim Menzies,et al.  Balancing Privacy and Utility in Cross-Company Defect Prediction , 2013, IEEE Transactions on Software Engineering.

[7]  Ken-ichi Matsumoto,et al.  A Dataset of High Impact Bugs: Manually-Classified Issue Reports , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[8]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[9]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[10]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[11]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[12]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[13]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[14]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[15]  Tim Menzies,et al.  Learning from Open-Source Projects: An Empirical Study on Defect Prediction , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[16]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[17]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[18]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Elaine J. Weyuker,et al.  Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models , 2008, Empirical Software Engineering.

[21]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[22]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[23]  Hinrich Schütze,et al.  Scoring , term weighting and thevector space model , 2015 .

[24]  Laurie A. Williams,et al.  Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[25]  Mehran Bozorgi,et al.  Beyond heuristics: learning to classify vulnerabilities and predict exploits , 2010, KDD.

[26]  Sandeep K. Singh,et al.  Automatic bug labeling using semantic information from LSI , 2014, 2014 Seventh International Conference on Contemporary Computing (IC3).

[27]  Michael W. Godfrey,et al.  Automated topic naming to support cross-project analysis of software maintenance activities , 2011, MSR '11.

[28]  Milos Manic,et al.  Mining Bug Databases for Unidentified Software Vulnerabilities , 2012, 2012 5th International Conference on Human System Interactions.

[29]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[30]  C. A. R. Hoare,et al.  Algorithm 64: Quicksort , 1961, Commun. ACM.

[31]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[32]  Riccardo Scandariato,et al.  Predicting Vulnerable Components: Software Metrics vs Text Mining , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[33]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[34]  Laurie A. Williams,et al.  Challenges with applying vulnerability prediction models , 2015, HotSoS.

[35]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[36]  Tim Menzies,et al.  Privacy and utility for defect prediction: Experiments with MORPH , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[37]  Tim Menzies,et al.  Optimizing requirements decisions with keys , 2008, PROMISE '08.

[38]  Alexander Serebrenik,et al.  Security and emotion: sentiment analysis of security discussions on GitHub , 2014, MSR 2014.

[39]  Laurie A. Williams,et al.  Approximating Attack Surfaces with Stack Traces , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[40]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[41]  Laurie A. Williams,et al.  Can traditional fault prediction models be used for vulnerability prediction? , 2011, Empirical Software Engineering.

[42]  Forrest Shull,et al.  Local versus Global Lessons for Defect Prediction and Effort Estimation , 2013, IEEE Transactions on Software Engineering.

[43]  Mu Zhu,et al.  A Relationship between the Average Precision and the Area Under the ROC Curve , 2015, ICTIR.

[44]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[45]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[46]  Paul Graham,et al.  Hackers & Painters: Big Ideas from the Computer Age , 2010 .

[47]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.