Real time cyber attack analysis on Hadoop ecosystem using machine learning algorithms

Big Data technologies are exciting cutting-edge technologies that generate, collect, store and analyse tremendous amount of data. Like any other IT revolution, Big Data technologies also have big challenges that are obstructing it to be adopted by wider community or perhaps impeding to extract value from Big Data with pace and accuracy it is promising. In this paper we first offer an alternative view of "Big Data Cloud" with the main aim to make this complex technology easy to understand for new researchers and identify gaps efficiently. In our lab experiment, we have successfully implemented cyber-attacks on Apache Hadoop's management interface "Ambari". On our thought about "attackers only need one way in", we have attacked the Apache Hadoop's management interface, successfully turned down all communication between Ambari and Hadoop's ecosystem and collected performance data from Ambari Virtual Machine (VM) and Big Data Cloud hypervisor. We have also detected these cyber-attacks with 94.0187% accurateness using modern machine learning algorithms. From the existing researchs, no one has ever attempted similar experimentation in detection of cyber-attacks on Hadoop using performance data.

[1]  A B M Shawkat Ali,et al.  Classifying different denial-of-service attacks in cloud computing using rule-based learning , 2012, Secur. Commun. Networks.

[2]  Jie Huang,et al.  HiTune: Dataflow-Based Performance Analysis for Big Data Cloud , 2011, USENIX Annual Technical Conference.

[3]  Dursun Delen,et al.  Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud , 2013, Decis. Support Syst..

[4]  Alvaro A. Cárdenas,et al.  Big Data Analytics for Security , 2013, IEEE Security & Privacy.

[5]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[6]  A. B. M. Shawkat Ali,et al.  Securing the Smart Grid: A Machine Learning Approach , 2013 .

[7]  A. B. M. Shawkat Ali,et al.  A survey on gaps, threat remediation challenges and some thoughts for proactive attack detection in cloud computing , 2012, Future Gener. Comput. Syst..

[8]  A. B. M. Shawkat Ali,et al.  Monitoring Insiders Activities in Cloud Computing Using Rule Based Learning , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[9]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[10]  A. B. M. Shawkat Ali,et al.  Trust Issues that Create Threats for Cyber Attacks in Cloud Computing , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[11]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[12]  Surajit Chaudhuri,et al.  What next?: a half-dozen data management research goals for big data and the cloud , 2012, PODS.

[13]  A. B. M. Shawkat Ali,et al.  Combating Cyber Attacks in Cloud Systems Using Machine Learning , 2014 .