LSTM-Based System-Call Language Modeling and Robust Ensemble Method for Designing Host-Based Intrusion Detection Systems

In computer security, designing a robust intrusion detection system is one of the most fundamental and important problems. In this paper, we propose a system-call language-modeling approach for designing anomaly-based host intrusion detection systems. To remedy the issue of high false-alarm rates commonly arising in conventional methods, we employ a novel ensemble method that blends multiple thresholding classifiers into a single one, making it possible to accumulate 'highly normal' sequences. The proposed system-call language model has various advantages leveraged by the fact that it can learn the semantic meaning and interactions of each system call that existing methods cannot effectively consider. Through diverse experiments on public benchmark datasets, we demonstrate the validity and effectiveness of the proposed method. Moreover, we show that our model possesses high portability, which is one of the key aspects of realizing successful intrusion detection systems.

[1]  V. Rao Vemuri,et al.  Using Text Categorization Techniques for Intrusion Detection , 2002, USENIX Security Symposium.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[4]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[7]  Hervé Debar,et al.  A neural network component for an intrusion detection system , 1992, Proceedings 1992 IEEE Computer Society Symposium on Research in Security and Privacy.

[8]  Xinghuo Yu,et al.  Evaluating Host-Based Anomaly Detection Systems: Application of the Frequency-Based Algorithms to ADFA-LD , 2014, NSS.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[12]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[13]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[14]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[15]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[16]  Anil Somayaji,et al.  Analysis of the 1999 DARPA/Lincoln Laboratory IDS evaluation data with NetADHICT , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[17]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[18]  Kymie M. C. Tan,et al.  Determining the operational limits of an anomaly-based intrusion detector , 2003, IEEE J. Sel. Areas Commun..

[19]  Xinghuo Yu,et al.  A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection , 2009, IEEE Network.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Andrew H. Sung,et al.  Intrusion detection using neural networks and support vector machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[22]  Jiankun Hu,et al.  Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[23]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[24]  Ralf C. Staudemeyer,et al.  Applying long short-term memory recurrent neural networks to intrusion detection , 2015 .

[25]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[26]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[27]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[28]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[29]  Jiankun Hu,et al.  A multi-layer model for anomaly intrusion detection using program sequences of system calls , 2003, The 11th IEEE International Conference on Networks, 2003. ICON2003..

[30]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[31]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[32]  Stephanie Forrest,et al.  The Evolution of System-Call Monitoring , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[33]  Jennifer G. Dy,et al.  System Call Anomaly Detection Using Multi-HMMs , 2014, 2014 IEEE Eighth International Conference on Software Security and Reliability-Companion.

[34]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[35]  Jiankun Hu,et al.  A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns , 2014, IEEE Transactions on Computers.

[36]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[37]  V Jyothsna A Review of Anomaly based Intrusion Detection Systems , 2011 .

[38]  Ralf C. Staudemeyer,et al.  Evaluating performance of long short-term memory recurrent neural networks on intrusion detection data , 2013, SAICSIT '13.