Unsupervised Insider Detection Through Neural Feature Learning and Model Optimisation

The insider threat is a significant security concern for both organizations and government sectors. Traditional machine learning-based insider threat detection approaches usually rely on domain focused feature engineering, which is expensive and impractical. In this paper, we propose an autoencoder-based approach aiming to automatically learn the discriminative features of the insider behaviours, thus alleviating security experts from tedious inspection tasks. Specifically, a Word2vec model is trained with a corpus transformed from various security logs to generate event representations. Instead of manually selecting Word2vec model parameters, we develop an autoencoder-based “parameter tuner” for the model to produce an optimal feature set. Then, the detection is undertaken by examining the reconstruction error of an autoencoder for each transformed event using the Carnegie Mellon University (CMU) CERT Programs insider threat database. Experimental results demonstrate that our proposed approach could achieve an extremely low false-positive rate (FPR) with all malicious events identified.

[1]  Kurt C. Wallnau,et al.  Generating Test Data for Insider Threat Detectors , 2014, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[2]  Wanlei Zhou,et al.  Twitter spam detection: Survey of new approaches and comparative study , 2017, Comput. Secur..

[3]  Alexander Liu,et al.  AI Lessons Learned from Experiments in Insider Threat Detection , 2006, AAAI Spring Symposium: What Went Wrong and Why: Lessons from AI Research and Applications.

[4]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[5]  Jun Zhang,et al.  A Visualization-Based Analysis on Classifying Android Malware , 2019, ML4CS.

[6]  A. Liu,et al.  A comparison of system call feature representations for insider threat detection , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[7]  Mohamed Medhat Gaber,et al.  Adaptive One-Class Ensemble-based Anomaly Detection: An Application to Insider Threats , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[8]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[9]  Qing-Long Han,et al.  Data-Driven Cyber Security in Perspective—Intelligent Traffic Analysis , 2020, IEEE Transactions on Cybernetics.

[10]  Brian Hutchinson,et al.  Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams , 2017, AAAI Workshops.

[11]  Ryutaro Ichise,et al.  Adjusting Word Embeddings by Deep Neural Networks , 2017, ICAART.

[12]  Rory Coulter,et al.  Intelligent agents defending for an IoT world: A review , 2018, Comput. Secur..

[13]  Jun Zhang,et al.  Anomaly-Based Insider Threat Detection Using Deep Autoencoders , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[14]  Joji Montelibano,et al.  Insider Threat Control: Using Centralized Logging to Detect Data Exfiltration Near Insider Termination , 2011 .

[15]  Thomas G. Dietterich,et al.  Detecting insider threats in a real corporate database of computer usage activity , 2013, KDD.

[16]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[17]  Ted E. Senator,et al.  Use of Domain Knowledge to Detect Insider Threats in Computer Activities , 2013, 2013 IEEE Security and Privacy Workshops.

[18]  William K. Robertson,et al.  Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks , 2013, ACSAC.

[19]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[20]  Geoffrey H. Kuenning,et al.  Detecting insider threats by monitoring system call activity , 2003, IEEE Systems, Man and Cybernetics SocietyInformation Assurance Workshop, 2003..

[21]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[22]  Wanlei Zhou,et al.  A Sword with Two Edges: Propagation Studies on Both Positive and Negative Information in Online Social Networks , 2015, IEEE Transactions on Computers.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Wanlei Zhou,et al.  Identifying Propagation Sources in Networks: State-of-the-Art and Comparative Studies , 2017, IEEE Communications Surveys & Tutorials.

[25]  Jun Zhang,et al.  Detecting and Preventing Cyber Insider Threats: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[26]  Jun Zhang,et al.  Insider Threat Identification Using the Simultaneous Neural Learning of Multi-Source Logs , 2019, IEEE Access.

[27]  Srikanta Tirthapura,et al.  Detecting Insider Threats Using RADISH: A System for Real-Time Anomaly Detection in Heterogeneous Data Streams , 2017, IEEE Systems Journal.

[28]  Xiao Chen,et al.  Android HIV: A Study of Repackaging Malware for Evading Machine-Learning Detection , 2018, IEEE Transactions on Information Forensics and Security.

[29]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[30]  Yap-Peng Tan,et al.  Scenario-Based Insider Threat Detection From Cyber Activities , 2018, IEEE Transactions on Computational Social Systems.

[31]  Marcus A. Maloof,et al.  elicit: A System for Detecting Insiders Who Violate Need-to-Know , 2007, RAID.

[32]  Joshua Glasser,et al.  Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data , 2013, 2013 IEEE Security and Privacy Workshops.

[33]  Paul Rimba,et al.  Data-Driven Cybersecurity Incident Prediction: A Survey , 2019, IEEE Communications Surveys & Tutorials.