Toward Realistic and Artifact-Free Insider-Threat Data

Progress in insider-threat detection is currently limited by a lack of realistic, publicly available, real-world data. For reasons of privacy and confidentiality, no one wants to expose their sensitive data to the research community. Data can be sanitized to mitigate privacy and confidentiality concerns, but the mere act of sanitizing the data may introduce artifacts that compromise its utility for research purposes. If sanitization artifacts change the results of insider-threat experiments, then those results could lead to conclusions which are not true in the real world. The goal of this work is to investigate the consequences of sanitization artifacts on insider-threat detection experiments. We assemble a suite of tools and present a methodology for collecting and sanitizing data. We use these tools and methods in an experimental evaluation of an insider-threat detection system. We compare the results of the evaluation using raw data to the results using each of three types of sanitized data, and we measure the effect of each sanitization strategy. We establish that two of the three sanitization strategies actually alter the results of the experiment. Since these two sanitization strategies are commonly used in practice, we must be concerned about the consequences of sanitization artifacts on insider-threat research. On the other hand, we demonstrate that the third sanitization strategy addresses these concerns, indicating that realistic, artifact-free data sets can be created with appropriate tools and methods.

[1]  Terran Lane,et al.  An Application of Machine Learning to Anomaly Detection , 1999 .

[2]  Gary Grossman,et al.  A Practical Executive for Secure Communications , 1982, 1982 IEEE Symposium on Security and Privacy.

[3]  Vern Paxson,et al.  A high-level programming environment for packet trace anonymization and transformation , 2003, SIGCOMM '03.

[4]  S. E. Smaha Haystack: an intrusion detection system , 1988, [Proceedings 1988] Fourth Aerospace Computer Security Applications.

[5]  Giovanni Vigna,et al.  NetSTAT: A Network-based Intrusion Detection System , 1999, J. Comput. Secur..

[6]  Saul Greenberg,et al.  USING UNIX: COLLECTED TRACES OF 168 USERS , 1988 .

[7]  Zhe Dang,et al.  The design and analysis of real-time systems using the ASTRAL software development environment , 1999, Ann. Softw. Eng..

[8]  Dawn M. Cappelli,et al.  Insider Threat Study: Illicit Cyber Activity in the Banking and Finance Sector , 2005 .

[9]  Roy A. Maxion,et al.  Masquerade detection augmented with error analysis , 2004, IEEE Transactions on Reliability.

[10]  Richard A. Kemmerer,et al.  A Formal Framework for ASTRAL Intralevel Proof Obligations , 1994, IEEE Trans. Software Eng..

[11]  Giovanni Vigna,et al.  Security Testing of an Online Banking Service , 2001, E-Commerce Security and Privacy.

[12]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[13]  Giovanni Vigna,et al.  A stateful intrusion detection system for World-Wide Web servers , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[14]  Richard A. Kemmerer,et al.  Web Browsers and Security , 1998, Mobile Agents and Security.

[15]  Christopher Krügel,et al.  Stateful intrusion detection for high-speed network's , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[16]  Richard A. Kemmerer,et al.  Unisex: A unix‐based symbolic executor for pascal , 1985, Softw. Pract. Exp..

[17]  Giovanni Vigna,et al.  NetSTAT: a network-based intrusion detection approach , 1998, Proceedings 14th Annual Computer Security Applications Conference (Cat. No.98EX217).

[18]  Roy A. Maxion,et al.  Masquerade detection using enriched command lines , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[19]  A. Karr,et al.  Computer Intrusion: Detecting Masquerades , 2001 .

[20]  John McHugh,et al.  An Experience Using Two Covert Channel Analysis Techniques on a Real System Design , 1987, IEEE Trans. Software Eng..

[21]  Richard A. Kemmerer,et al.  SDC Secure Release Terminal Project , 1983, 1983 IEEE Symposium on Security and Privacy.

[22]  R. Montague,et al.  Logic : Techniques of Formal Reasoning , 1964 .

[23]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[24]  Dawn M. Cappelli,et al.  Insider Threat Study: Computer System Sabotage in Critical Infrastructure Sectors , 2005 .

[25]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[26]  Giovanni Vigna,et al.  Designing and implementing a family of intrusion detection systems , 2003, ESEC/FSE-11.

[27]  Giovanni Vigna,et al.  STATL: An Attack Language for State-Based Intrusion Detection , 2002, J. Comput. Secur..

[28]  J. Meseguer,et al.  Security Policies and Security Models , 1982, 1982 IEEE Symposium on Security and Privacy.

[29]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[30]  Richard A. Kemmerer,et al.  State Transition Analysis: A Rule-Based Intrusion Detection Approach , 1995, IEEE Trans. Software Eng..

[31]  Zhe Dang,et al.  Using the ASTRAL model checker to analyze Mobile IP , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[32]  Richard A. Kemmerer,et al.  Shared resource matrix methodology: an approach to identifying storage and timing channels , 1983, TOCS.