Identifying Security Bug Reports Based Solely on Report Titles and Noisy Data

Identifying security bug reports (SBRs) is a vital step in the software development life-cycle. In supervised machine learning based approaches, it is usual to assume that entire bug reports are available for training and that their labels are noise free. To the best of our knowledge, this is the first study to show that accurate label prediction is possible for SBRs even when solely the title is available and in the presence of label noise.

[1]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[3]  Hai Jin,et al.  Automatically Identifying Security Bug Reports via Multitype Features Analysis , 2018, ACISP.

[4]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[5]  Naresh Manwani,et al.  A Team of Continuous-Action Learning Automata for Noise-Tolerant Learning of Half-Spaces , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Anuja Arora,et al.  A bug Mining tool to identify and analyze security bugs using Naive Bayes and TF-IDF , 2014, 2014 International Conference on Reliability Optimization and Information Technology (ICROIT).

[7]  Benoît Frénay,et al.  Uncertainty and label noise in machine learning , 2013 .

[8]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[9]  Mykola Pechenizkiy,et al.  Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[10]  Katerina Goseva-Popstojanova,et al.  Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[11]  Milos Manic,et al.  Vulnerability identification and classification via text mining bug databases , 2014, IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society.

[12]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[13]  Taghi M. Khoshgoftaar,et al.  Identifying learners robust to low quality data , 2008, 2008 IEEE International Conference on Information Reuse and Integration.

[14]  Choh-Man Teng,et al.  A Comparison of Noise Handling Techniques , 2001, FLAIRS.

[15]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[16]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[17]  Xinli Yang,et al.  Automated Identification of High Impact Bug Reports Leveraging Imbalanced Learning Strategies , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[18]  Claire Marais-Sicre,et al.  Effect of Training Class Label Noise on Classification Performances for Land Cover Mapping with Satellite Image Time Series , 2017, Remote. Sens..