Unsupervised Feature Learning With Distributed Stacked Denoising Sparse Autoencoder for Abnormal Behavior Detection Using Apache Spark

The modern age of internet connectivity and advanced communication technologies has created an ever larger area for cyber attackers to develop, which has resulted in the need for fast and accurate detection of those sophisticated attacks. Abnormal behavior detection is a data analysis task that identifies interesting and emerging patterns from data. Many research in the area of abnormal behavior detection has used machine learning and deep learning techniques to classify anomaly traffic from normal traffic. However, due the massive volumes of data that need to be analyzed and the fast development of attacks, most of the existing machine learning and deep learning solutions for network intrusion detection have low accuracy and less scalability over long period of time, thus an efficient distributed deep detection method is required. In this paper, we propose a novel semi -supervised distributed approach based on stacked denoising sparse autoencoder and SVM for large-scale intrusion detection systems. Our aim is to explore the suitability of big data analytics and deep learning techniques in the context of abnormal behavior detection system. First, a distributed stacked denoising sparse autoencoder is applied to perform an unsupervised non-linear dimensionality reduction. Then, the reduced features are embed to the distributed SVM for classification. The approach is carried out using the iterative reduce paradigm based on Spark. Experimental results on four cyber security datasets including KDD Cup'99, NSL-KDD, UNSW-NB15 and CICIDS2017 show that the proposed method yields promising performance and reduces the detection time.