Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data

The threat of malicious insider activity continues to be of paramount concern in both the public and private sectors. Though there is great interest in advancing the state of the art in predicting and stopping these threats, the difficulty of obtaining suitable data for research, development, and testing remains a significant hinderance. We outline the use of synthetic data to enable progress in one research program, while discussing the benefits and limitations of synthetic insider threat data, the meaning of realism in this context, as well as future research directions.

[1]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[2]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[3]  K. Bradley Paxton,et al.  Use of Synthetic Data in Testing Administrative Records Systems , 2012 .

[4]  Charles V. Wright,et al.  Generating Client Workloads and High-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security , 2010, RAID.

[5]  Vincent H. Berk,et al.  Generating realistic environments for cyber operations development, testing, and training , 2012, Defense + Commercial Sensing.

[6]  Amin Vahdat,et al.  Swing: realistic and responsive network traffic generation , 2009, TNET.

[7]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[8]  Christos Faloutsos,et al.  Weighted Graphs and Disconnected Components , 2008 .

[9]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[10]  Christos Faloutsos,et al.  RTG: a recursive realistic graph generator using random typing , 2009, Data Mining and Knowledge Discovery.

[11]  Alexander I. Rudnicky,et al.  Stochastic natural language generation for spoken dialog systems , 2002, Comput. Speech Lang..

[12]  M. Mount,et al.  RELATIONSHIP OF PERSONALITY TRAITS AND COUNTERPRODUCTIVE WORK BEHAVIORS: THE MEDIATING EFFECTS OF JOB SATISFACTION , 2006 .

[13]  Christos Faloutsos,et al.  Weighted graphs and disconnected components: patterns and a generator , 2008, KDD.

[14]  Paul Barford,et al.  Self-configuring network traffic generation , 2004, IMC '04.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Malek Ben Salem,et al.  Modeling User Search Behavior for Masquerade Detection , 2011, RAID.