Comparison of System Call Representations for Intrusion Detection

Over the years, artificial neural networks have been applied successfully in many areas including IT security. Yet, neural networks can only process continuous input data. This is particularly challenging for security-related non-continuous data like system calls. This work focuses on four different options to preprocess sequences of system calls so that they can be processed by neural networks. These input options are based on one-hot encoding and learning word2vec or GloVe representations of system calls. As an additional option, we analyze if the mapping of system calls to their respective kernel modules is an adequate generalization step for (a) replacing system calls or (b) enhancing system call data with additional information regarding their context. However, when performing such preprocessing steps it is important to ensure that no relevant information is lost during the process. The overall objective of system call based intrusion detection is to categorize sequences of system calls as benign or malicious behavior. Therefore, this scenario is used to evaluate the different input options as a classification task. The results show, that each of the four different methods is a valid option when preprocessing input data, but the use of kernel modules only is not recommended because too much information is being lost during the mapping process.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[3]  Jiankun Hu,et al.  An efficient hidden Markov model training scheme for anomaly intrusion detection of server applications based on system calls , 2004, Proceedings. 2004 12th IEEE International Conference on Networks (ICON 2004) (IEEE Cat. No.04EX955).

[4]  Andrew S. Miner,et al.  Anomaly intrusion detection using one class SVM , 2004, Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004..

[5]  Steven A. Hofmeyr,et al.  Intrusion Detection via System Call Traces , 1997, IEEE Softw..

[6]  Salvatore J. Stolfo,et al.  Modeling system calls for intrusion detection with dynamic window sizes , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[7]  Jiankun Hu,et al.  Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[10]  Xinghuo Yu,et al.  Evaluating Host-Based Anomaly Detection Systems: Application of the Frequency-Based Algorithms to ADFA-LD , 2014, NSS.

[11]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[12]  Paul Jacob,et al.  Host Based Intrusion Detection System with Combined CNN/RNN Model , 2018, Nemesis/UrbReas/SoGood/IWAISe/GDM@PKDD/ECML.

[13]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Kuldip K. Paliwal,et al.  Intrusion detection using text processing techniques with a kernel based similarity measure , 2007, Comput. Secur..

[16]  Jiankun Hu,et al.  A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns , 2014, IEEE Transactions on Computers.

[17]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.

[18]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[19]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[20]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[21]  Abdelwahab Hamou-Lhadj,et al.  A trace abstraction approach for host-based anomaly detection , 2015, 2015 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).

[22]  Andreas Hotho,et al.  IP2Vec: Learning Similarities Between IP Addresses , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Yunheung Paek,et al.  LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems , 2016, ArXiv.