Evaluation of Deep Learning Methods Efficiency for Malicious and Benign System Calls Classification on the AWSCTD

The increasing amount of malware and cyberattacks on a host level increases the need for a reliable anomaly-based host IDS (HIDS) that would be able to deal with zero-day attacks and would ensure low false alarm rate (FAR), which is critical for the detection of such activity. Deep learning methods such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are considered to be highly suitable for solving data-driven security solutions. Therefore, it is necessary to perform the comparative analysis of such methods in order to evaluate their efficiency in attack classification as well as their ability to distinguish malicious and benign activity. In this article, we present the results achieved with the AWSCTD (attack-caused Windows OS system calls traces dataset), which can be considered as the most exhaustive set of host-level anomalies at the moment, including 112.56 million system calls from 12110 executable malware samples and 3145 benign software samples with 16.3 million system calls. The best results were obtained with CNNs with up to 90.0% accuracy for family classification and 95.0% accuracy for malicious/benign determination. RNNs demonstrated slightly inferior results. Furthermore, CNN tuning via an increase in the number of layers should make them practically applicable for host-level anomaly detection.

[1]  Jiankun Hu,et al.  Evaluating host-based anomaly detection systems: A preliminary analysis of ADFA-LD , 2013, 2013 6th International Congress on Image and Signal Processing (CISP).

[2]  Nikolaj Goranin,et al.  Towards a Robust Method of Dataset Generation of Malicious Activity for Anomaly-Based HIDS Training and Presentation of AWSCTD Dataset , 2018, Balt. J. Mod. Comput..

[3]  Howon Kim,et al.  Long Short Term Memory Recurrent Neural Network Classifier for Intrusion Detection , 2016, 2016 International Conference on Platform Technology and Service (PlatCon).

[4]  Hervé Debar,et al.  An application of a recurrent network to an intrusion detection system , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[5]  Jiankun Hu,et al.  Windows Based Data Sets for Evaluation of Robustness of Host Based Intrusion Detection Systems (IDS) to Zero-Day and Stealth Attacks , 2016, Future Internet.

[6]  Jiankun Hu,et al.  An Approach for Host-Based Intrusion Detection System Design Using Convolutional Neural Network , 2017, MONAMI.

[7]  Md Zahangir Alom,et al.  Intrusion detection using deep belief networks , 2015, 2015 National Aerospace and Electronics Conference (NAECON).

[8]  A. Halim Zaim,et al.  A hybrid intrusion detection system design for computer network security , 2009, Comput. Electr. Eng..

[9]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10]  Vijay Kumar Jha,et al.  Data Mining in Intrusion Detection: A Comparative Study of Methods, Types and Data Sets , 2013 .

[11]  Sang Hyun Kim,et al.  Method of intrusion detection using deep neural network , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[14]  David Slater,et al.  Malicious Behavior Detection using Windows Audit Logs , 2015, AISec@CCS.

[15]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[16]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.

[17]  Andrew Hay,et al.  OSSEC Host-Based Intrusion Detection Guide , 2008 .

[18]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[19]  Shikha Agrawal,et al.  Survey on Anomaly Detection using Data Mining Techniques , 2015, KES.

[20]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[21]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Joel J. P. C. Rodrigues,et al.  Digital signature of network segment for healthcare environments support , 2014 .

[24]  Nikolaj Goranin,et al.  Investigation of AWSCTD dataset applicability for malware type classification , 2018 .

[25]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[26]  Yuefei Zhu,et al.  A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks , 2017, IEEE Access.

[27]  Azween Abdullah,et al.  Artificial neural network approaches to intrusion detection: a review , 2009, IEEE ICT 2009.

[28]  Adeel Akram,et al.  A comparative analysis of artificial neural network technologies in intrusion detection systems , 2006 .

[29]  Robert K. Cunningham,et al.  Evaluating Intrusion Detection Systems Without Attacking Your Friends: The 1998 DARPA Intrusion Detection Evaluation , 1999 .

[30]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31]  Jiankun Hu,et al.  Host-Based Anomaly Intrusion Detection , 2010, Handbook of Information and Communication Security.

[32]  Zheng Wang,et al.  Deep Learning-Based Intrusion Detection With Adversaries , 2018, IEEE Access.