Generating Test Data for Insider Threat Detectors

The threat of malicious insider activity continues to be of paramount concern in both the public and private sectors. Though there is great interest in advancing the state of the art in predicting and stopping these threats, the difficulty of obtaining suitable data for research, development, and testing remains a significant hindrance. We outline the use of a synthetic data generator to enable research progress, while discussing the benefits and limitations of synthetic insider threat data, the meaning of realism in this context, comparisons to a hybrid real/synthetic data approach, and future research directions.

[1]  Joshua Glasser,et al.  Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data , 2013, 2013 IEEE Security and Privacy Workshops.

[2]  Malek Ben Salem,et al.  Modeling User Search Behavior for Masquerade Detection , 2011, RAID.

[3]  Vincent H. Berk,et al.  Generating realistic environments for cyber operations development, testing, and training , 2012, Defense + Commercial Sensing.

[4]  Mark A. Whiting,et al.  Creating realistic, scenario-based synthetic data for test and evaluation of information analytics software , 2008, BELIV.

[5]  Paul Barford,et al.  Self-configuring network traffic generation , 2004, IMC '04.

[6]  Amin Vahdat,et al.  Swing: Realistic and Responsive Network Traffic Generation , 2009, IEEE/ACM Transactions on Networking.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[9]  M. Mount,et al.  RELATIONSHIP OF PERSONALITY TRAITS AND COUNTERPRODUCTIVE WORK BEHAVIORS: THE MEDIATING EFFECTS OF JOB SATISFACTION , 2006 .

[10]  K. Bradley Paxton,et al.  Use of Synthetic Data in Testing Administrative Records Systems , 2012 .

[11]  Christos Faloutsos,et al.  Weighted graphs and disconnected components: patterns and a generator , 2008, KDD.

[12]  Charles V. Wright,et al.  Generating Client Workloads and High-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security , 2010, RAID.

[13]  Christos Faloutsos,et al.  RTG: a recursive realistic graph generator using random typing , 2009, Data Mining and Knowledge Discovery.

[14]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[15]  Alexander I. Rudnicky,et al.  Stochastic natural language generation for spoken dialog systems , 2002, Comput. Speech Lang..