The Challenges of Labeling Vulnerability-Contributing Commits

Software projects developed using version control are enhanced incrementally through commits, some of which inevitably introduce security vulnerabilities. The features of these vulnerability-contributing commits (VCCs) could be used to train a VCC detector or to inform software development best-practices. Previous work has attempted to label VCCs in open-source software projects for this purpose. We present a manual approach to VCC labeling using the fix commits listed in Common Vulnerabilities and Exposures (CVEs). We show that a published automated method of VCC labeling disagrees with our manual method on 42% of VCCs. We argue that the automated method, while effective in scaling VCC labeling, is therefore not sufficiently accurate. Finally, we discuss the benefits and drawbacks of trying to predict vulnerable software components rather than VCCs.

[1]  Viet Hung Nguyen,et al.  Predicting vulnerable software components with dependency graphs , 2010, MetriSec '10.

[2]  Matthew Smith,et al.  VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits , 2015, CCS.

[3]  Riccardo Scandariato,et al.  Predicting Vulnerable Components: Software Metrics vs Text Mining , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[4]  Manar Alohaly,et al.  When Do Changes Induce Software Vulnerabilities? , 2017, 2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC).

[5]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[6]  Andrew Meneely,et al.  When a Patch Goes Bad: Exploring the Properties of Vulnerability-Contributing Commits , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[7]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.