Unsupervised Scanning Behavior Detection Based on Distribution of Network Traffic Features Using Robust Autoencoders

In intrusion detection systems, the application of machine learning to detection methods has attracted interest from vendors and researchers, because most existing signature-based methods cannot cope with increasing incidents, and some machine learning methods, particularly neural networks (NNs), have achieved remarkable results in computer vision and other fields. In applying NNs to intrusion detection based on network traffic, it is very important to adopt unsupervised methods instead of supervised ones, because unsupervised methods are better for detecting unseen attacks, and network traffic behavior depends very much on each host or network, so that it is not realistic to prepare labels for each environment. In previous studies, however, the application of unsupervised methods has not been investigated much. A flow-based detection method using replicator NNs, or autoencoders, is one of a few unsupervised methods. The experimental results showed that unsupervised anomaly detection using autoencoders is certainly effective for detecting scanning behavior, however, the many false positives are still challenging. In this paper, flow-based intrusion detection using robust autoencoders (RAEs) is proposed. RAEs split the training data into benign and outlier features in an unsupervised manner, and the model learns low-dimensional representations only from the benign data, and splitting and learning are repeated alternately. We argue that RAEs reduce false positives in the test data by removing outliers in the training data. In this paper, a real-world traffic dataset, the MAWI (Measurement and Analysis on the WIDE Internet), was used for evaluations. Experiments showed that false positives were significantly reduced and attacks were detected more easily.

[1]  Max Mühlhäuser,et al.  ID2T: A DIY dataset creation toolkit for Intrusion Detection Systems , 2015, 2015 IEEE Conference on Communications and Network Security (CNS).

[2]  Parvez Ahammad,et al.  SoK: Applying Machine Learning in Security - A Survey , 2016, ArXiv.

[3]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[4]  George Bebis,et al.  A survey of network flow applications , 2013, J. Netw. Comput. Appl..

[5]  R Hecht-Nielsen,et al.  Replicator neural networks for universal optimal source coding. , 1995, Science.

[6]  Randy C. Paffenroth,et al.  Anomaly Detection with Robust Deep Autoencoders , 2017, KDD.

[7]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[10]  Max Mühlhäuser,et al.  Analyzing flow-based anomaly intrusion detection using Replicator Neural Networks , 2016, 2016 14th Annual Conference on Privacy, Security and Trust (PST).

[11]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[12]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[13]  Todd L. Heberlein,et al.  Network intrusion detection , 1994, IEEE Network.

[14]  Jürgen Quittek,et al.  Requirements for IP Flow Information Export (IPFIX) , 2004, RFC.