Classifier Suites for Insider Threat Detection

Better methods to detect insider threats need new anticipatory analytics to capture risky behavior prior to losing data. In search of the best overall classifier, this work empirically scores 88 machine learning algorithms in 16 major families. We extract risk features from the large CERT dataset, which blends real network behavior with individual threat narratives. We discover the predictive importance of measuring employee sentiment. Among major classifier families tested on CERT, the random forest algorithms offer the best choice, with different implementations scoring over 98% accurate. In contrast to more obscure or black-box alternatives, random forests are ensembles of many decision trees and thus offer a deep but human-readable set of detection rules (>2000 rules). We address performance rankings by penalizing long execution times against higher median accuracies using cross-fold validation. We address the relative rarity of threats as a case of low signal-to-noise (< 0.02% malicious to benign activities), and then train on both under-sampled and over-sampled data which is statistically balanced to identify nefarious actors.

[1]  Dawn M. Cappelli,et al.  The CERT Guide to Insider Threats: How to Prevent, Detect, and Respond to Information Technology Crimes , 2012 .

[2]  Brian Hutchinson,et al.  Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams , 2017, AAAI Workshops.

[3]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[4]  Alexander Liu,et al.  AI Lessons Learned from Experiments in Insider Threat Detection , 2006, AAAI Spring Symposium: What Went Wrong and Why: Lessons from AI Research and Applications.

[5]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[6]  Frank L. Greitzer,et al.  Identifying At-Risk Employees: Modeling Psychosocial Precursors of Potential Insider Threats , 2012, 2012 45th Hawaii International Conference on System Sciences.

[7]  Frank L. Greitzer,et al.  Modeling Human Behavior to Anticipate Insider Attacks , 2011 .

[8]  Elizabeth D. Liddy,et al.  Semantic Analysis for Monitoring Insider Threats , 2004, ISI.

[9]  Frederick T. Sheldon,et al.  Anomaly detection in multiple scale for insider threat analysis , 2011, CSIIRW '11.

[10]  Oliver Brdiczka,et al.  Proactive Insider Threat Detection through Graph Learning and Psychological Context , 2012, 2012 IEEE Symposium on Security and Privacy Workshops.

[11]  Hung Q. Ngo,et al.  Towards a theory of insider threat assessment , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[12]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[13]  Deborah A. Frincke,et al.  Combining Traditional Cyber Security Audit Data with Psychosocial Data: Towards Predictive Modeling for Insider Threat Mitigation , 2010, Insider Threats in Cyber Security.

[14]  Matthew L Collins,et al.  Insider Threat Indicator Ontology , 2016 .

[15]  Kurt C. Wallnau,et al.  Generating Test Data for Insider Threat Detectors , 2014, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[16]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[17]  Lawrence B. Holder,et al.  Graph-based approaches to insider threat detection , 2009, CSIIRW '09.

[18]  Christian W. Probst,et al.  Insiders and Insider Threats - An Overview of Definitions and Mitigation Techniques , 2011, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[19]  Mudita Singhal,et al.  Supervised and Unsupervised methods to detect Insider Threat from Enterprise Social and Online Activity Data , 2015, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[20]  Thomas G. Dietterich,et al.  Systematic construction of anomaly detection benchmarks from real data , 2013, ODD '13.

[21]  Kathryn B. Laskey,et al.  Detecting Threatening Behavior Using Bayesian Networks , 2006 .