The Vulnerability Dataset of a Large Software Ecosystem

Security bugs are critical programming errors that can lead to serious vulnerabilities in software. Examining their behaviour and characteristics within a software ecosystem can provide the research community with data regarding their evolution, persistence and others. We present a dataset that we produced by applying static analysis to the Maven Central Repository (approximately 265GB of data) in order to detect potential security bugs. For our analysis we used FindBugs, a tool that examines Java bytecode to detect numerous types of bugs. The dataset contains the metrics' results that FindBugs reports for every project version (a JAR) included in the ecosystem. For every version in our data repository, we also store specific metadata, such as the JAR's size, its dependencies and others. Our dataset can be used to produce interesting research results involving security bugs, as we show in specific examples.

[1]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[2]  Angelos D. Keromytis,et al.  Buffer Overflow Attacks , 2011, Encyclopedia of Cryptography and Security.

[3]  Fabio Massacci,et al.  After-Life Vulnerabilities: A Study on Firefox Evolution, Its Vulnerabilities, and Fixes , 2011, ESSoS.

[4]  John A. Hamilton,et al.  Methods for the prevention, detection and removal of software security vulnerabilities , 2004, ACM-SE 42.

[5]  David Hovemeyer,et al.  Tracking defect warnings across versions , 2006, MSR '06.

[6]  Michele Lanza,et al.  The small project observatory: a tool for reverse engineering software ecosystems , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[7]  Sarah Smith Heckman,et al.  A Model Building Process for Identifying Actionable Static Analysis Alerts , 2009, 2009 International Conference on Software Testing Verification and Validation.

[8]  Stuart E. Schechter,et al.  Milk or Wine: Does Software Security Improve with Age? , 2006, USENIX Security Symposium.

[9]  Jay Ligatti,et al.  Defining code-injection attacks , 2012, POPL '12.

[10]  Jacob West,et al.  Secure Programming with Static Analysis , 2007 .

[11]  Lars Lundberg,et al.  Evaluating the cost reduction of static code analysis for software security , 2008, PLAS '08.

[12]  Rahul Telang,et al.  Impact of Software Vulnerability Announcements on the Market Value of Software Vendors - an Empirical Investigation , 2005, WEIS.

[13]  Liqun Chen,et al.  An historical examination of open source releases and their vulnerabilities , 2012, CCS.

[14]  William Pugh,et al.  The Google FindBugs fixit , 2010, ISSTA '10.

[15]  Mohammad Zulkernine,et al.  Mitigating program security vulnerabilities: Approaches and challenges , 2012, CSUR.

[16]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[17]  Georgios Gousios,et al.  Dismal Code: Studying the Evolution of Security Bugs , 2013, LASER.

[18]  Arie van Deursen,et al.  The Maven repository dataset of metrics, changes, and dependencies , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[19]  Muhammad Zubair Shafiq,et al.  A large scale exploratory analysis of software vulnerability life cycles , 2012, 2012 34th International Conference on Software Engineering (ICSE).