CVE-assisted large-scale security bug report dataset construction method

Abstract Identifying SBRs (security bug reports) is crucial for eliminating security issues during software development. Machine learning are promising ways for SBR prediction. However, the effectiveness of the state-of-the-art machine learning models depend on high-quality datasets, while gathering large-scale datasets are expensive and tedious. To solve this issue, we propose an automated data labeling approach based on iterative voting classification. It starts with a small group of ground-truth traing samples, which can be labeled with the help of authoritative vulnerability records hosted in CVE (Common Vulnerabilities and Exposures). The accuracy of the prediction model is improved with an iterative voting strategy. By using this approach, we label over 80k bug reports from OpenStack and 40k bug reports from Chromium. The correctness of these labels are then manually reviewed by three experienced security testing members. Finally, we construct a large-scale SBR dataset with 191 SBRs and 88,472 NSBRs (non-security bug reports) from OpenStack; and improve the quality of existing SBR dataset Chromium by identifying 64 new SBRs from previously labeled NSBRs and filtering out 173 noise bug reports from this dataset. These share datasets as well as the proposed dataset construction method help to promote research progress in SBR prediction research domain.

[1]  Li Li,et al.  Categorizing and Predicting Invalid Vulnerabilities on Common Vulnerabilities and Exposures , 2018, 2018 25th Asia-Pacific Software Engineering Conference (APSEC).

[2]  Tanakorn Leesatapornwongsa,et al.  What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems , 2014, SoCC.

[3]  Long Wang,et al.  Dissecting Open Source Cloud Evolution: An OpenStack Case Study , 2013, HotCloud.

[4]  Anoop Singhal,et al.  VULCAN: Vulnerability Assessment Framework for Cloud Computing , 2013, 2013 IEEE 7th International Conference on Software Security and Reliability.

[5]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[6]  Pavol Zavarsky,et al.  Trend Analysis of the CVE for Software Vulnerability Management , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[7]  Theophilus Benson,et al.  A First Look at Bugs in OpenStack , 2016, CAN@CoNEXT.

[8]  Anuja Arora,et al.  A bug Mining tool to identify and analyze security bugs using Naive Bayes and TF-IDF , 2014, 2014 International Conference on Reliability Optimization and Information Technology (ICROIT).

[9]  Zhenchang Xing,et al.  Neural Network-based Detection of Self-Admitted Technical Debt: From Performance to Explainability , 2019, ACM Trans. Softw. Eng. Methodol..

[10]  Vung Pham,et al.  CVExplorer: Multidimensional Visualization for Common Vulnerabilities and Exposures , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[11]  Ahmed E. Hassan,et al.  Security versus performance bugs: a case study on Firefox , 2011, MSR '11.

[12]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[13]  Gabriele Bavota,et al.  Detecting missing information in bug descriptions , 2017, ESEC/SIGSOFT FSE.

[14]  Jugal K. Kalita,et al.  Deep Learning applied to NLP , 2017, ArXiv.

[15]  Bashar Nuseibeh,et al.  Text Filtering and Ranking for Security Bug Report Prediction , 2019, IEEE Transactions on Software Engineering.

[16]  Foutse Khomh,et al.  Experience Report: An Empirical Study of API Failures in OpenStack Cloud Environments , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[17]  David Lo,et al.  A Large Scale Study of Long-Time Contributor Prediction for GitHub Projects , 2021, IEEE Transactions on Software Engineering.

[18]  Yong Xiang,et al.  Debugging OpenStack Problems Using a State Graph Approach , 2016, APSys.

[19]  Ernesto Damiani,et al.  A Security Benchmark for OpenStack , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[20]  Li Ying,et al.  Characterizing and Predicting Bug Assignment in OpenStack , 2015, 2015 Second International Conference on Trustworthy Systems and Their Applications.

[21]  David Lo,et al.  Chaff from the Wheat: Characterizing and Determining Valid Bug Reports , 2020, IEEE Transactions on Software Engineering.

[22]  T. Junk Confidence Level Computation for Combining Searches with Small Statistics , 1999, hep-ex/9902006.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Tim Menzies,et al.  Is "Better Data" Better Than "Better Data Miners"? , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[25]  Xinli Yang,et al.  High-Impact Bug Report Identification with Imbalanced Learning Strategies , 2017, Journal of Computer Science and Technology.

[26]  Christian Platzer,et al.  MARVIN: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[27]  R. A. Martin Integrating your information security vulnerability management capabilities through industry standards (CVE&OVAL) , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[28]  David Lo,et al.  Perceptions, Expectations, and Challenges in Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[29]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[30]  Ben Hermann,et al.  A vulnerability's lifetime: enhancing version information in CVE databases , 2015, I-KNOW.

[31]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[32]  Milos Manic,et al.  Mining Bug Databases for Unidentified Software Vulnerabilities , 2012, 2012 5th International Conference on Human System Interactions.

[33]  Katerina Goseva-Popstojanova,et al.  Experience Report: Security Vulnerability Profiles of Mission Critical Software: Empirical Analysis of Security Related Bug Reports , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[34]  Tim Menzies,et al.  Better Security Bug Report Classification via Hyperparameter Optimization , 2019, ArXiv.

[35]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[36]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[37]  K. M. Annervaz,et al.  Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[38]  Tingting Yu,et al.  ConPredictor: Concurrency Defect Prediction in Real-World Applications , 2018, IEEE Transactions on Software Engineering.

[39]  Chen Feng,et al.  Towards understanding bugs in an open source cloud management stack: An empirical study of OpenStack software bugs , 2019, J. Syst. Softw..

[40]  Katerina Goseva-Popstojanova,et al.  Security Vulnerability Profiles of NASA Mission Software: Empirical Analysis of Security Related Bug Reports , 2017 .

[41]  Zhenchang Xing,et al.  Inference of development activities from interaction with uninstrumented applications , 2017, Empirical Software Engineering.

[42]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[43]  Xiang Chen,et al.  MULTI: Multi-objective effort-aware just-in-time software defect prediction , 2018, Inf. Softw. Technol..

[44]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[45]  Laurie A. Williams,et al.  Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[46]  Ken-ichi Matsumoto,et al.  A Dataset of High Impact Bugs: Manually-Classified Issue Reports , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[47]  Chao Zhang,et al.  $\alpha$ Diff: Cross-Version Binary Code Similarity Detection with DNN , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).