Detecting insider threats in a real corporate database of computer usage activity

This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations' information systems. Our system combines structural and semantic information from a real corporate database of monitored activity on their users' computers to detect independently developed red team inserts of malicious insider activities. We have developed and applied multiple algorithms for anomaly detection based on suspected scenarios of malicious insider behavior, indicators of unusual activities, high-dimensional statistical patterns, temporal sequences, and normal graph evolution. Algorithms and representations for dynamic graph processing provide the ability to scale as needed for enterprise-level deployments on real-time data streams. We have also developed a visual language for specifying combinations of features, baselines, peer groups, time periods, and algorithms to detect anomalies suggestive of instances of insider threat behavior. We defined over 100 data features in seven categories based on approximately 5.5 million actions per day from approximately 5,500 users. We have achieved area under the ROC curve values of up to 0.979 and lift values of 65 on the top 50 user-days identified on two months of real data.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[3]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Ted E. Senator,et al.  Multi-stage classification , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[7]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[8]  Ian H. Witten,et al.  One-Class Classification by Combining Density and Class Probability Estimation , 2008, ECML/PKDD.

[9]  David A. Bader,et al.  STINGER : Spatio-Temporal Interaction Networks and Graphs ( STING ) Extensible Representation , 2009 .

[10]  Irfan A. Essa,et al.  A novel sequence representation for unsupervised analysis of human activities , 2009, Artif. Intell..

[11]  David A. Bader,et al.  Detecting Communities from Given Seeds in Social Networks , 2011 .

[12]  Aniket Kittur,et al.  Apolo: making sense of large network data by combining rich user interaction and machine learning , 2011, CHI.

[13]  Salvatore J. Stolfo,et al.  Anomaly Detection at Multiple Scales (ADAMS) , 2011 .

[14]  David A. Bader,et al.  A Fast Algorithm for Streaming Betweenness Centrality , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[15]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[16]  Dawn M. Cappelli,et al.  The CERT Guide to Insider Threats: How to Prevent, Detect, and Respond to Information Technology Crimes , 2012 .

[17]  David A. Bader,et al.  STINGER: High performance data structure for streaming graphs , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[18]  David A. Bader,et al.  Scalable Multi-threaded Community Detection in Social Networks , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[19]  David A. Bader,et al.  Multithreaded Community Monitoring for Massive Streaming Graph Data , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[20]  Ted E. Senator,et al.  Use of Domain Knowledge to Detect Insider Threats in Computer Activities , 2013, 2013 IEEE Security and Privacy Workshops.

[21]  Erica Briscoe,et al.  Layered behavioral trace modeling for threat detection , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[22]  Ted E. Senator,et al.  Context-Aware Insider Threat Detection , 2013, AAAI 2013.

[23]  Irfan A. Essa,et al.  Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Joshua Glasser,et al.  Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data , 2013, 2013 IEEE Security and Privacy Workshops.

[25]  David A. Bader,et al.  Faster Betweenness Centrality Based on Data Structure Experimentation , 2013, ICCS.

[26]  S. Delgado STUDENTS , 1969, Keywords in Radical Philosophy and Education.