Unsupervised Ensemble Based Learning for Insider Threat Detection

Insider threats are veritable needles within the haystack. Their occurrence is rare and when they do occur, are usually masked well within normal operation. The detection of these threats requires identifying these rare anomalous needles in a contextualized setting where behaviors are constantly evolving over time. To this refined search, this paper proposes and tests an unsupervised, ensemble based learning algorithm that maintains a compressed dictionary of repetitive sequences found throughout dynamic data streams of unbounded length to identify anomalies. In unsupervised learning, compression-based techniques are used to model common behavior sequences. This results in a classifier exhibiting a substantial increase in classification accuracy for data streams containing insider threat anomalies. This ensemble of classifiers allows the unsupervised approach to outperform traditional static learning approaches and boosts the effectiveness over supervised learning approaches.

[1]  Bhavani M. Thuraisingham,et al.  Unsupervised incremental sequence learning for insider threat detection , 2012, 2012 IEEE International Conference on Intelligence and Security Informatics.

[2]  Saul Greenberg,et al.  USING UNIX: COLLECTED TRACES OF 168 USERS , 1988 .

[3]  Salvatore J. Stolfo,et al.  Learning Rules from System Call Arguments and Sequences for Anomaly 20 Detection , 2003 .

[4]  A. Karr,et al.  Computer Intrusion: Detecting Masquerades , 2001 .

[5]  Brian D. Davison,et al.  Predicting Sequences of User Actions , 1998 .

[6]  Christopher Krügel,et al.  On the Detection of Anomalous System Call Arguments , 2003, ESORICS.

[7]  Salvatore J. Stolfo,et al.  One-Class Training for Masquerade Detection , 2003 .

[8]  Boleslaw K. Szymanski,et al.  Recursive data mining for masquerade detection and author identification , 2004, Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004..

[9]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Roy A. Maxion,et al.  Masquerade detection using enriched command lines , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[12]  V. Rao Vemuri,et al.  Using Text Categorization Techniques for Intrusion Detection , 2002, USENIX Security Symposium.

[13]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[14]  O. Aalen,et al.  Understanding the shape of the hazard rate: A proce ss point of view , 2002 .

[15]  Malek Ben Salem,et al.  A Survey of Insider Attack Detection Research , 2008, Insider Attack and Cyber Security.

[16]  Bhavani M. Thuraisingham,et al.  A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[18]  Bhavani M. Thuraisingham,et al.  Insider Threat Detection Using Stream Mining and Graph Mining , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[19]  Hans W. Guesgen,et al.  Unsupervised Learning of Patterns in Data Streams Using Compression and Edit Distance , 2011, IJCAI.

[20]  A. Liu,et al.  A comparison of system call feature representations for insider threat detection , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[21]  Charu C. Aggarwal,et al.  Addressing Concept-Evolution in Concept-Drifting Data Streams , 2010, 2010 IEEE International Conference on Data Mining.

[22]  Yehuda Vardi,et al.  A Hybrid High-Order Markov Chain Model for Computer Intrusion Detection , 2001 .