Categorizing and Predicting Invalid Vulnerabilities on Common Vulnerabilities and Exposures

To share vulnerability information across separate databases, tools, and services, newly identified vulnerabilities are recurrently reported to Common Vulnerabilities and Exposures (CVE) database.Unfortunately, not all vulnerability reports will be accepted. Some of them might get rejected or be accepted with disputations.In this work, we refer to those rejected or disputed CVEs as invalid vulnerability reports. Invalid vulnerability reports not only cause unnecessary efforts to confirm the vulnerability but also impact the reputation of the software vendors. In this paper, we aim to understand the root causes of invalid vulnerability reports and build a prediction model to automatically identify them.To this end, we first leverage card sorting to categorize invalid vulnerability reports, from which six main reasons are observed for rejected and disputed CVEs, respectively.Then, we propose a text mining approach to predict the invalid vulnerability reports. Our experiments reveal that the proposed text mining approach can achieve an AUC score of 0.87 for predicting invalid vulnerabilities. We also discuss the implications of our study: our categorization can be used to guide new committer to avoid these traps; some root causes of invalid CVEs can be avoided by using automatic techniques or optimizing reviewing mechanism; invalid vulnerability reports data should not be neglected.

[1]  Rahul Telang,et al.  An Empirical Analysis of the Impact of Software Vulnerability Announcements on Firm Stock Price , 2007, IEEE Transactions on Software Engineering.

[2]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[3]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[4]  Michael Howard,et al.  Measuring Relative Attack Surfaces , 2005 .

[5]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[6]  Laurie A. Williams,et al.  Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[7]  Laurie A. Williams,et al.  An empirical model to predict security vulnerabilities using code complexity metrics , 2008, ESEM '08.

[8]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[9]  Jian Luo,et al.  A Software Vulnerability Rating Approach Based on the Vulnerability Database , 2014, J. Appl. Math..

[10]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[11]  David Lo,et al.  Automating Change-Level Self-Admitted Technical Debt Determination , 2019, IEEE Transactions on Software Engineering.

[12]  Wouter Joosen,et al.  Software vulnerability prediction using text analysis techniques , 2012, MetriSec '12.

[13]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[14]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[15]  David Lo,et al.  Automated Configuration Bug Report Prediction Using Text Mining , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[16]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[17]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[18]  Bonita Sharif,et al.  Improving the accuracy of duplicate bug report detection using textual similarity measures , 2014, MSR 2014.

[19]  Michael Howard Improving Software Security by Eliminating the CWE Top 25 Vulnerabilities , 2009, IEEE Security & Privacy.

[20]  Zhenchang Xing,et al.  Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[21]  David Lo,et al.  Early prediction of merged code changes to prioritize reviewing tasks , 2018, Empirical Software Engineering.

[22]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Indrajit Ray,et al.  Measuring, analyzing and predicting security vulnerabilities in software systems , 2007, Comput. Secur..

[24]  Neeraj Suri,et al.  Predictive vulnerability scoring in the context of insufficient information availability , 2013, 2013 International Conference on Risks and Security of Internet and Systems (CRiSIS).

[25]  David Lo,et al.  Chaff from the Wheat: Characterizing and Determining Valid Bug Reports , 2020, IEEE Transactions on Software Engineering.

[26]  Daniele Romano,et al.  Using source code metrics to predict change-prone Java interfaces , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[27]  Sushil Jajodia,et al.  An Attack Graph-Based Probabilistic Security Metric , 2008, DBSec.

[28]  David Lo,et al.  Identifying self-admitted technical debt in open source projects using text mining , 2017, Empirical Software Engineering.

[29]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[30]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  David Lo,et al.  Combining Software Metrics and Text Features for Vulnerable File Prediction , 2015, 2015 20th International Conference on Engineering of Complex Computer Systems (ICECCS).

[33]  Mehran Bozorgi,et al.  Beyond heuristics: learning to classify vulnerabilities and predict exploits , 2010, KDD.

[34]  David Lo,et al.  How Android App Developers Manage Power Consumption? - An Empirical Study by Mining Power Management Commits , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[35]  Ahmed Tamrawi,et al.  Fuzzy set and cache-based approach for bug triaging , 2011, ESEC/FSE '11.

[36]  Andrew Meneely,et al.  Do Bugs Foreshadow Vulnerabilities? A Study of the Chromium Project , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[37]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.