Security Trend Analysis with CVE Topic Models

We study the vulnerability reports in the Common Vulnerability and Exposures (CVE) database by using topic models on their description texts to find prevalent vulnerability types and new trends semi-automatically. In our study of the 39,393 unique CVEs until the end of 2009, we identify the following trends, given here in the form of a weather forecast: PHP: declining, with occasional SQL injection. Buffer Overflows: flattening out after decline. Format Strings: in steep decline. SQL Injection and XSS: remaining strong, and rising. Cross-Site Request Forgery: a sleeping giant perhaps, stirring. Application Servers: rising steeply.

[1]  Robert A. Martin,et al.  Vulnerability Type Distributions in CVE , 2007 .

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  Tomi Männistö,et al.  Improving CVSS-based vulnerability prioritization and response with context information , 2009, ESEM 2009.

[4]  Yuanyuan Zhou,et al.  Have things changed now?: an empirical study of bug characteristics in modern open source software , 2006, ASID '06.

[5]  Eugene H. Spafford,et al.  The internet worm program: an analysis , 1989, CCRV.

[6]  Karen A. Scarfone,et al.  An analysis of CVSS version 2 vulnerability scoring , 2009, ESEM 2009.

[7]  Yashwant K. Malaiya,et al.  AN ANALYSIS OF THE VULNERABILITY DISCOVERY PROCESS IN WEB BROWSERS , 2006 .

[8]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[9]  Eric Rescorla,et al.  Is finding security holes a good idea? , 2005, IEEE Security & Privacy.

[10]  Tomi Männistö,et al.  Improving CVSS-based vulnerability prioritization and response with context information , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[11]  E. Tufte Beautiful Evidence , 2006 .

[12]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[15]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[16]  Stuart E. Schechter,et al.  Milk or Wine: Does Software Security Improve with Age? , 2006, USENIX Security Symposium.