Real time intrusion detection system for ultra-high-speed big data environments

In recent years, the number of people using the Internet and network services is increasing day by day. On a daily basis, a large amount of data is generated over the Internet from zeta byte to petabytes with a very high speed. On the other hand, we see more security threats on the network, the Internet, websites, and the enterprise network. Therefore, detecting intrusion in such ultra-high-speed environment in real time is a challenging task. Many intrusion detection systems (IDSs) are proposed for various types of network attacks using machine learning approaches. Most of them are unable to detect recent unknown attacks, whereas the others do not provide a real-time solution to overcome the above-mentioned challenges. Therefore, to address these problems, we propose a real-time intrusion detection system for ultra-high-speed big data environment using Hadoop implementation. The proposed system includes four-layered IDS architecture, which consists of the capturing layer, filtration and load balancing layer, processing or Hadoop layer, and the decision-making layer. Furthermore, feature selection scheme is proposed that selects nine parameters for classification using (FSR) and (BER), as well as from the analysis of DARPA datasets. In addition, five major machine learning approaches are used to evaluate the proposed system including J48, REPTree, random forest tree, conjunctive rule, support vector machine, and Naïve Bayes classifiers. Results show that among all these classifiers, REPTree and J48 are the best classifiers in terms of accuracy as well as efficiency. The proposed system architecture is evaluated with respect to accuracy in terms of true positive (TP) and false positive (FP), with respect to efficiency in terms of processing time and by comparing results with traditional techniques. It has more than 99 % TP and less than 0.001 % FP on REPTree and J48. The system has overall higher accuracy than existing IDSs with the capability to work in real time in ultra-high-speed big data environment.

[1]  Sung-Bae Cho,et al.  Incorporating soft computing techniques into a probabilistic intrusion detection system , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[2]  A. Nur Zincir-Heywood,et al.  Analysis of Three Intrusion Detection System Benchmark Datasets Using Machine Learning Algorithms , 2005, ISI.

[3]  David M. Nicol,et al.  Knowledge Discovery from Big Data for Intrusion Detection Using LDA , 2014, 2014 IEEE International Congress on Big Data.

[4]  Bharat K. Bhargava,et al.  Identifying important characteristics in the KDD99 intrusion detection dataset by feature selection using a hybrid approach , 2010, 2010 17th International Conference on Telecommunications.

[5]  Bhavani M. Thuraisingham,et al.  A new intrusion detection system using support vector machines and hierarchical clustering , 2007, The VLDB Journal.

[6]  L. Javier García-Villalba,et al.  On the Anomaly Intrusion-Detection in Mobile Ad Hoc Network Environments , 2006, PWC.

[7]  Khalil El-Khatib,et al.  Impact of Feature Reduction on the Efficiency of Wireless Intrusion Detection Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.

[8]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[9]  Kwangjo Kim,et al.  Machine-Learning-Based Feature Selection Techniques for Large-Scale Network Intrusion Detection , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[10]  Michaël Rusinowitch,et al.  Efficient decision tree for protocol analysis in intrusion detection , 2010, Int. J. Secur. Networks.

[11]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[12]  Malcolm I. Heywood,et al.  Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 , 2005, PST.

[13]  Ravi Sankar,et al.  A Survey of Intrusion Detection Systems in Wireless Sensor Networks , 2014, IEEE Communications Surveys & Tutorials.

[14]  Antonio Alfredo Ferreira Loureiro,et al.  Malicious node detection in wireless sensor networks , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[15]  Vir V. Phoha,et al.  K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  Wenke Lee,et al.  Intrusion detection in wireless ad-hoc networks , 2000, MobiCom '00.

[17]  M. Ravikanth,et al.  Impact of Feature Reduction on the Efficiency of Wireless Intrusion Detection Systems , 2011 .

[18]  Fakhri Karray,et al.  Features Selection for Intrusion Detection Systems Based on Support Vector Machines , 2009, 2009 6th IEEE Consumer Communications and Networking Conference.

[19]  Radu State,et al.  A Big Data Architecture for Large Scale Security Monitoring , 2014, 2014 IEEE International Congress on Big Data.

[20]  Awais Ahmad,et al.  An efficient divide-and-conquer approach for big data analytics in machine-to-machine communication , 2016, Neurocomputing.

[21]  Nur Izura Udzir,et al.  A K-Means and Naive Bayes Learning Approach for Better Intrusion Detection , 2011 .

[22]  Tai-Myoung Chung,et al.  Big data analysis system concept for detecting unknown attacks , 2014, 16th International Conference on Advanced Communication Technology.

[23]  Radu State,et al.  Machine Learning Approach for IP-Flow Record Anomaly Detection , 2011, Networking.

[24]  Wenke Lee,et al.  Intrusion Detection Techniques for Mobile Wireless Networks , 2003, Wirel. Networks.

[25]  Luca Deri,et al.  High speed network traffic analysis with commodity multi-core systems , 2010, IMC '10.

[26]  Awais Ahmad,et al.  Real-Time Big Data Analytical Architecture for Remote Sensing Application , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[27]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[28]  George Kesidis,et al.  Detecting malicious packet dropping using statistically regular traffic patterns in multihop wireless networks that are not bandwidth limited , 2003, GLOBECOM '03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489).

[29]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[30]  Amitabh Mishra,et al.  Intrusion detection in MANETS - the second wall of defense , 2003, IECON'03. 29th Annual Conference of the IEEE Industrial Electronics Society (IEEE Cat. No.03CH37468).

[31]  Francisco Maciá Pérez,et al.  Network Intrusion Detection System Embedded on a Smart Sensor , 2011, IEEE Transactions on Industrial Electronics.

[32]  Antonio Alfredo Ferreira Loureiro,et al.  Decentralized intrusion detection in wireless sensor networks , 2005, Q2SWinet '05.

[33]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[34]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[35]  Vegard Engen Machine learning for network based intrusion detection : an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data , 2010 .

[36]  Mukesh Nair,et al.  An Efficient Divide and Conquer Approach for Big Data Analytics in Machine to Machine Communication , 2016 .

[37]  Xiangjian He,et al.  Enhancing Big Data Security with Collaborative Intrusion Detection , 2014, IEEE Cloud Computing.

[38]  Satria Mandala,et al.  A survey on MANET intrusion detection , 2008 .