An N-Gram and STF-IDF model for masquerade detection in a UNIX environment

A masquerader is someone who impersonates another user and operates a computer system with privileged access. Computer security problems caused by masqueraders are serious. Although anomaly detection is considered to be the best way to detect masqueraders, due to the low probability of detection and high error rate, this method is still in the research phase. Thus far, a number of methods, such as the Support Vector Machine (SVM), the Hidden Markov Model (HMM), and the Naïve Bayes (N. Bayes) classifier technique, have been investigated in order to further improve accuracy of detection. In the present paper, a method of integrating Data Mining and Natural Language Processing, namely, the N-Gram_Square root Term Frequency-Inverse Document Frequency (N-Gram_STF-IDF), is proposed. Using the proposed method, sequences to be detected are segmented via N-Gram characteristics, and non-normal users are then detected using a STF-IDF classifier. We perform an experiment using Schonlau and Greenberg data sets and the proposed method and compare the obtained results with results obtained using various other methods.

[1]  Roman V. Yampolskiy Human Computer Interaction Based Intrusion Detection , 2007, Fourth International Conference on Information Technology (ITNG'07).

[2]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[3]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[4]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[5]  Stefan Axelsson,et al.  Intrusion Detection Systems: A Survey and Taxonomy , 2002 .

[6]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[7]  Martin P. Loeb,et al.  CSI/FBI Computer Crime and Security Survey , 2004 .

[8]  A. Karr,et al.  Computer Intrusion: Detecting Masquerades , 2001 .

[9]  Roy A. Maxion,et al.  Masquerade detection augmented with error analysis , 2004, IEEE Transactions on Reliability.

[10]  Mario Latendresse,et al.  Masquerade Detection via Customized Grammars , 2005, DIMVA.

[11]  Fabrizio Sebastiani,et al.  Supervised term weighting for automated text categorization , 2003, SAC '03.

[12]  Kazuhiko Kato,et al.  Anomaly Detection Using Layered Networks Based on Eigen Co-occurrence Matrix , 2004, RAID.

[13]  Xiangliang Zhang,et al.  Fast intrusion detection based on a non-negative matrix factorization model , 2009, J. Netw. Comput. Appl..

[14]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[15]  Sung Deok Cha,et al.  Empirical evaluation of SVM-based masquerade detection using UNIX commands , 2005, Comput. Secur..

[16]  Jian Zhou,et al.  Masquerade detection by boosting decision stumps using UNIX commands , 2007, Comput. Secur..

[17]  V. Rao Vemuri,et al.  Adaptive anomaly detection with evolving connectionist systems , 2007, J. Netw. Comput. Appl..

[18]  Saul Greenberg,et al.  USING UNIX: COLLECTED TRACES OF 168 USERS , 1988 .

[19]  A. Murali,et al.  A Survey on Intrusion Detection Approaches , 2005, 2005 International Conference on Information and Communication Technologies.