When Do Changes Induce Software Vulnerabilities?

Version control systems (VCSs) have almost become the de facto standard for the management of open-source projects and the development of their source code. In VCSs, source code which can potentially be vulnerable is introduced to a system through what are so called commits. Vulnerable commits force the system into an insecure state. The farreaching impact of vulnerabilities attests to the importance of identifying and understanding the characteristics of prior vulnerable changes (or commits), in order to detect future similar ones. The concept of change classification was previously studied in the literature of bug detection to identify commits with defects. In this paper, we borrow the notion of change classification from the literature of defect detection to further investigate its applicability to vulnerability detection problem using semi-supervised learning. In addition, we also experiment with new vulnerability predictors, and compare the predictive power of our proposed features with vulnerability prediction techniques based on text mining. The experimental results show that our semi-supervised approach holds promise in improving change classification effectiveness by leveraging unlabeled data.

[1]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[2]  Rasoul Samad Zadeh Kaljahi,et al.  Adapting Self-Training for Semantic Role Labeling , 2010, ACL.

[3]  Konstantin Serebryany,et al.  MemorySanitizer: Fast detector of uninitialized memory use in C++ , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[4]  Saumya K. Debray,et al.  Automatic Simplification of Obfuscated JavaScript Code: A Semantics-Based Approach , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[5]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[6]  Laurie A. Williams,et al.  An empirical model to predict security vulnerabilities using code complexity metrics , 2008, ESEM '08.

[7]  Daniel M. Germán,et al.  Continuously mining distributed version control systems: an empirical study of how Linux uses Git , 2014, Empirical Software Engineering.

[8]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[9]  Nimal Nissanke,et al.  Component security - issues and an approach , 2005, 29th Annual International Computer Software and Applications Conference (COMPSAC'05).

[10]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[11]  Thomas Zimmermann,et al.  Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[12]  Laurie A. Williams,et al.  Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[13]  Lwin Khin Shar,et al.  Predicting common web application vulnerabilities from input validation and sanitization code patterns , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[14]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[15]  Arvind Narayanan,et al.  De-anonymizing Programmers via Code Stylometry , 2015, USENIX Security Symposium.

[16]  Riccardo Scandariato,et al.  Predicting Vulnerable Components: Software Metrics vs Text Mining , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[17]  Miguel Correia,et al.  Automatic detection and correction of web application vulnerabilities using data mining to predict false positives , 2014, WWW.

[18]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[19]  Laurie A. Williams,et al.  Is complexity really the enemy of software security? , 2008, QoP '08.

[20]  Wouter Joosen,et al.  Software vulnerability prediction using text analysis techniques , 2012, MetriSec '12.

[21]  Matthew Smith,et al.  VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits , 2015, CCS.

[22]  Laurie A. Williams,et al.  Can traditional fault prediction models be used for vulnerability prediction? , 2011, Empirical Software Engineering.

[23]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.