Toward a standard benchmark for computer security research: the worldwide intelligence network environment (WINE)

Unlike benchmarks that focus on performance or reliability evaluations, a benchmark for computer security must necessarily include sensitive code and data. Because these artifacts could damage systems or reveal personally identifiable information about the users affected by cyber attacks, publicly disseminating such a benchmark raises several scientific, ethical and legal challenges. We propose the Worldwide Intelligence Network Environment (WINE), a security-benchmarking approach based on rigorous experimental methods. WINE includes representative field data, collected worldwide from 240,000 sensors, for new empirical studies, and it will enable the validation of research on all the phases in the lifecycle of security threats. We tackle the key challenges for security benchmarking by designing a platform for repeatable experimentation on the WINE data sets and by collecting the metadata required for understanding the results. In this paper, we review the unique characteristics of the WINE data, we discuss why rigorous benchmarking will provide fresh insights on the security arms race and we propose a research agenda for this area.

[1]  Vern Paxson,et al.  Strategies for sound internet measurement , 2004, IMC '04.

[2]  Roy A. Maxion,et al.  Masquerade detection augmented with error analysis , 2004, IEEE Transactions on Reliability.

[3]  Kimberly Keeton,et al.  Do you know your IQ?: a research agenda for information quality in systems , 2010, PERV.

[4]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[5]  Michael D. Ernst,et al.  Automatically patching errors in deployed software , 2009, SOSP '09.

[6]  M. Merkow,et al.  2010 CWE/SANS Top 25 Most Dangerous Programming Errors , 2010 .

[7]  Engin Kirda,et al.  Exploiting diverse observation perspectives to get insights on the malware landscape , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[8]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[9]  Chris Chatfield,et al.  Statistics for Technology (A Course in Applied Statistics) , 1984 .

[10]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[11]  Stefan Frei,et al.  Security econometrics: The dynamics of (in)security , 2009 .

[12]  Hoan Anh Nguyen,et al.  Detection of recurring software vulnerabilities , 2010, ASE.

[13]  Tzi-cker Chiueh,et al.  Automatic Generation of String Signatures for Malware Detection , 2009, RAID.

[14]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[15]  David J. DeWitt,et al.  The Wisconsin Benchmark: Past, Present, and Future , 1991, The Benchmark Handbook.

[16]  David Brumley,et al.  Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[17]  Roger H. Moore,et al.  Statistics for Technology: A Course in Applied Statistics (2nd ed.). , 1981 .

[18]  Angelos D. Keromytis,et al.  An Analysis of Rogue AV Campaigns , 2010, RAID.

[19]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Eric Eide,et al.  An Experimentation Workbench for Replayable Networking Research , 2007, NSDI.

[22]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[23]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.