Software Metrics and Security Vulnerabilities: Dataset and Exploratory Study

Code with certain characteristics is more prone to have security vulnerabilities. In fact, studies show that code not following best practices is harder to verify and maintain, and consequently is more probable to have vulnerabilities left unnoticed or inadvertently introduced. In this experience report, we study whether software metrics can reflect such characteristics, thus having some correlation with the existence of vulnerabilities. The analysis is based on 2875 security patches, used to build a dataset with metrics and vulnerabilities for all the functions, classes and files of 5750 versions of five widely used projects that are exposed to attacks: Linux Kernel, Mozilla, Xen Hypervisor, httpd and glibc. We calculated software metrics from their sources and used correlation algorithm and statistical tests on these metrics in order to identify relations between them and the existing vulnerabilities. Results show that software metrics are able to discriminate vulnerable and non vulnerable functions, but it is not possible to find strong correlations between these metrics and the number of vulnerabilities existing in the analyzed functions. Finally, the results indicate that vulnerable functions are probable to have other vulnerabilities in the future.

[1]  Michael E. Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[2]  Indrajit Ray,et al.  Measuring, analyzing and predicting security vulnerabilities in software systems , 2007, Comput. Secur..

[3]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[4]  Laurie A. Williams,et al.  An initial study on the use of execution complexity metrics as indicators of software vulnerabilities , 2011, SESS '11.

[5]  Mohammad Zulkernine,et al.  Can complexity, coupling, and cohesion metrics be used as early indicators of vulnerabilities? , 2010, SAC '10.

[6]  Riccardo Scandariato,et al.  Predicting Vulnerable Components: Software Metrics vs Text Mining , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[7]  Foutse Khomh,et al.  An exploratory study of the impact of antipatterns on class change- and fault-proneness , 2011, Empirical Software Engineering.

[8]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[9]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[10]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[11]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[12]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[13]  D. Hinkle,et al.  Applied statistics for the behavioral sciences , 1979 .

[14]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[15]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[16]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[17]  D. S. Moore,et al.  The Basic Practice of Statistics , 2001 .