Insider Threat Identification Using the Simultaneous Neural Learning of Multi-Source Logs

Insider threat detection has drawn increasing attention in recent years. In order to capture a malicious insider’s digital footprints that occur scatteredly across a wide range of audit data sources over a long period of time, existing approaches often leverage a scoring mechanism to orchestrate alerts generated from multiple sub-detectors, or require domain knowledge-based feature engineering to conduct a one-off analysis across multiple types of data. These approaches result in a high deployment complexity and incur additional costs for engaging security experts. In this paper, we present a novel approach that works with a variety of security logs. The security logs are transformed into texts in the same format and then arranged as a corpus. Using the model trained by Word2vec with the corpus, we are enabled to approximate the posterior probabilities for insider behaviours. Accordingly, we label the transformed events as suspicious if their behavioural probabilities are smaller than a given threshold, and a user is labelled as malicious if he/she is associated with multiple suspicious events. The experiments are undertaken with the Carnegie Mellon University (CMU) CERT Programs insider threat database v6.2, which not only demonstrate that the proposed approach is effective and scalable in practical applications but also provide a guidance for tuning the parameters and thresholds.

[1]  Ted E. Senator,et al.  Use of Domain Knowledge to Detect Insider Threats in Computer Activities , 2013, 2013 IEEE Security and Privacy Workshops.

[2]  Ivan Martinovic,et al.  Looks Like Eve , 2016, ACM Trans. Priv. Secur..

[3]  Jay F. Nunamaker,et al.  Identifying Insider Threats through Monitoring Mouse Movements in Concealed Information Tests , 2013 .

[4]  Bhavani M. Thuraisingham,et al.  Insider Threat Detection Using Stream Mining and Graph Mining , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[5]  Joji Montelibano,et al.  Insider Threat Control: Using Centralized Logging to Detect Data Exfiltration Near Insider Termination , 2011 .

[6]  Yanbing Liu,et al.  Insider Threat Detection with Deep Neural Network , 2018, ICCS.

[7]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[8]  Wanlei Zhou,et al.  Identifying Propagation Sources in Networks: State-of-the-Art and Comparative Studies , 2017, IEEE Communications Surveys & Tutorials.

[9]  Jun Zhang,et al.  Detecting and Preventing Cyber Insider Threats: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[10]  Xiao Chen,et al.  Android HIV: A Study of Repackaging Malware for Evading Machine-Learning Detection , 2018, IEEE Transactions on Information Forensics and Security.

[11]  William K. Robertson,et al.  Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks , 2013, ACSAC.

[12]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[13]  Malek Ben Salem,et al.  System Level User Behavior Biometrics using Fisher Features and Gaussian Mixture Models , 2013, 2013 IEEE Security and Privacy Workshops.

[14]  Yap-Peng Tan,et al.  Scenario-Based Insider Threat Detection From Cyber Activities , 2018, IEEE Transactions on Computational Social Systems.

[15]  Wanlei Zhou,et al.  A Sword with Two Edges: Propagation Studies on Both Positive and Negative Information in Online Social Networks , 2015, IEEE Transactions on Computers.

[16]  Chengjun Liu,et al.  The Bayes Decision Rule Induced Similarity Measures , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Qing-Long Han,et al.  Data-Driven Cyber Security in Perspective—Intelligent Traffic Analysis , 2020, IEEE Transactions on Cybernetics.

[18]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Marcus A. Maloof,et al.  elicit: A System for Detecting Insiders Who Violate Need-to-Know , 2007, RAID.

[21]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[22]  Longbing Cao,et al.  SVDD-based outlier detection on uncertain data , 2012, Knowledge and Information Systems.

[23]  Joshua Glasser,et al.  Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data , 2013, 2013 IEEE Security and Privacy Workshops.

[24]  Paul Rimba,et al.  Data-Driven Cybersecurity Incident Prediction: A Survey , 2019, IEEE Communications Surveys & Tutorials.

[25]  Jiankun Hu,et al.  Scalable Hypergrid k-NN-Based Online Anomaly Detection in Wireless Sensor Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[26]  Dawn M. Cappelli,et al.  Common Sense Guide to Mitigating Insider Threats 4th Edition , 2012 .

[27]  Biming Tian,et al.  Anomaly detection in wireless sensor networks: A survey , 2011, J. Netw. Comput. Appl..

[28]  Thomas G. Dietterich,et al.  Detecting insider threats in a real corporate database of computer usage activity , 2013, KDD.

[29]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[30]  Ahmed Awad E. Ahmed Employee Surveillance Based on Free Text Detection of Keystroke Dynamics , 2009 .

[31]  Kurt C. Wallnau,et al.  Generating Test Data for Insider Threat Detectors , 2014, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[32]  Wanlei Zhou,et al.  Twitter spam detection: Survey of new approaches and comparative study , 2017, Comput. Secur..

[33]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[34]  Amos Azaria,et al.  Behavioral Analysis of Insider Threat: A Survey and Bootstrapped Prediction in Imbalanced Data , 2014, IEEE Transactions on Computational Social Systems.

[35]  Yuval Elovici,et al.  Insight Into Insiders and IT , 2018, ACM Comput. Surv..

[36]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[37]  Jun Zhang,et al.  Anomaly-Based Insider Threat Detection Using Deep Autoencoders , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[38]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[39]  Oliver Brdiczka,et al.  Proactive Insider Threat Detection through Graph Learning and Psychological Context , 2012, 2012 IEEE Symposium on Security and Privacy Workshops.

[40]  Brian Hutchinson,et al.  Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams , 2017, AAAI Workshops.

[41]  Karen Kent,et al.  Guide to Computer Security Log Management , 2006 .

[42]  Philip S. Yu,et al.  An Efficient Approach for Outlier Detection with Imperfect Data Labels , 2014, IEEE Transactions on Knowledge and Data Engineering.

[43]  Geoffrey H. Kuenning,et al.  Detecting insider threats by monitoring system call activity , 2003, IEEE Systems, Man and Cybernetics SocietyInformation Assurance Workshop, 2003..