The constant upsurge in the size of networks and the data massively produced by them has made the data analysis very challenging principally the data attaining the boundaries of big data and it becomes even more difficult to detect intrusions in the case of big data. In this era, the experts find very limited tools and methods to analyze big data for security reasons. Either we need to device new tools or we can use existing tools in a novel manner to achieve the purpose of big data security analysis. In this paper, we are using apache spark a big data tool for analyzing the big dataset for anomaly detection. The anomaly detection is performed by using different machine learning algorithms like Logistic regression, Support vector machine, Naïve bayes, Decision trees, Random forest, and Kmeans. More or less all the aforementioned algorithms are capable to detect anomalies in big data but we need to know how efficiently each performs. The main objective of this investigation is to find the most efficient algorithm in the context of anomaly detection. In this regard, we set to compare their training time, prediction time, and the rate of accuracy. The analysis was implemented on Kddcup99 dataset. Although this dataset is of size in megabytes but it meets our purpose here for big data security analytics.
[1]
Salvatore J. Stolfo,et al.
A framework for constructing features and models for intrusion detection systems
,
2000,
TSEC.
[2]
Xiangjian He,et al.
Enhancing Big Data Security with Collaborative Intrusion Detection
,
2014,
IEEE Cloud Computing.
[3]
Jacinth Salome,et al.
Fuzzy Data Mining and Genetic Algorithms Applied to Intrusion Detection
,
2007
.
[4]
Vipin Kumar,et al.
A Comparative Study of Classification Techniques for Intrusion Detection
,
2013,
2013 International Symposium on Computational and Business Intelligence.
[5]
Mohammad Zulkernine,et al.
Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection
,
2006,
2006 IEEE International Conference on Communications.
[6]
Patrick Wendell,et al.
Learning Spark: Lightning-Fast Big Data Analytics
,
2015
.