Anomaly Detection Model Based on Hadoop Platform and Weka Interface

Anomaly detection is playing an increasingly important role in network security, and the ability to detect and process anomalies for big data in real-time is a difficult task. In this conditions, this paper presents a model which combine cloud computing with machine learning. Hadoop is a widely used open source cloud computing framework to big data. The traffic data stored in HDFS and processed by MapReduce. Besides these, machine learning module selected best performance algorithm from multiple algorithms by called Weka interface. Moreover, naïve Bayes, decision tree and SVM are used to validate the accuracy and efficiency. Finally, experimental results demonstrate that this method has a good performance in detection with above 90% of accuracy.

[1]  Lukasz Saganowski,et al.  Statistical and signal‐based network traffic recognition for anomaly detection , 2012, Expert Syst. J. Knowl. Eng..

[2]  Andreas Thor,et al.  Multi-pass sorted neighborhood blocking with MapReduce , 2012, Computer Science - Research and Development.

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  Marin Litoiu,et al.  An architecture for overlaying private clouds on public providers , 2012, 2012 8th international conference on network and service management (cnsm) and 2012 workshop on systems virtualiztion management (svm).

[5]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[6]  Amutha Prabakar Muniyandi,et al.  Network Anomaly Detection by Cascading K-Means Clustering and C4.5 Decision Tree algorithm , 2012 .

[7]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[8]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[9]  ChenTsuhan,et al.  Malicious web content detection by machine learning , 2010 .

[10]  Dogru Nejdet Traffic Accident Detection By Using Machine Learning Methods , 2012 .

[11]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[12]  Ying Wang,et al.  A Density-Based Anomaly Detection Method for MapReduce , 2012, 2012 IEEE 11th International Symposium on Network Computing and Applications.

[13]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[14]  Thinn Thu Naing,et al.  Naïve Bayes Classifier Based Traffic Prediction System on Cloud Infrastructure , 2015, 2015 6th International Conference on Intelligent Systems, Modelling and Simulation.