Big Data Analytics framework for Peer-to-Peer Botnet detection using Random Forests

Abstract Network traffic monitoring and analysis-related research has struggled to scale for massive amounts of data in real time. Some of the vertical scaling solutions provide good implementation of signature based detection. Unfortunately these approaches treat network flows across different subnets and cannot apply anomaly-based classification if attacks originate from multiple machines at a lower speed, like the scenario of Peer-to-Peer Botnets. In this paper the authors build up on the progress of open source tools like Hadoop, Hive and Mahout to provide a scalable implementation of quasi-real-time intrusion detection system. The implementation is used to detect Peer-to-Peer Botnet attacks using machine learning approach. The contributions of this paper are as follows: (1) Building a distributed framework using Hive for sniffing and processing network traces enabling extraction of dynamic network features; (2) Using the parallel processing power of Mahout to build Random Forest based Decision Tree model which is applied to the problem of Peer-to-Peer Botnet detection in quasi-real-time. The implementation setup and performance metrics are presented as initial observations and future extensions are proposed.

[1]  Marc Sánchez Artigas,et al.  Understanding the effects of P2P dynamics on trust bootstrapping , 2013, Inf. Sci..

[2]  Michael K. Reiter,et al.  Are Your Hosts Trading or Plotting? Telling P2P File-Sharing and Bots Apart , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[3]  Francesco Palmieri,et al.  A botnet-based command and control approach relying on swarm intelligence , 2014, J. Netw. Comput. Appl..

[4]  Jesús E. Díaz-Verdejo,et al.  Segmental parameterisation and statistical modelling of e-mail headers for spam detection , 2012, Inf. Sci..

[5]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[6]  Mingteh Chen,et al.  The Analysis and Identification of P2P Botnet's Traffic Flows , 2011, Int. J. Commun. Networks Inf. Secur..

[7]  Yu-Fang Chung,et al.  Shielding wireless sensor network using Markovian intrusion detection system with attack pattern mining , 2013, Inf. Sci..

[8]  Francesco Palmieri,et al.  Network anomaly detection through nonlinear analysis , 2010, Comput. Secur..

[9]  Tamas Skopko,et al.  Software-Based Packet Capturing with High Precision Timestamping for Linux , 2010, 2010 Fifth International Conference on Systems and Networks Communications.

[10]  Jae-Seo Lee,et al.  Detecting P2P Botnets Using a Multi-phased Flow Model , 2009, 2009 Third International Conference on Digital Society.

[11]  Sharath Chandra Guntuku,et al.  Real-time Peer-to-Peer Botnet Detection Framework based on Bayesian Regularized Neural Network , 2013, ArXiv.

[12]  Mehdi R. Zargham,et al.  A self-organizing map and its modeling for discovering malignant network traffic , 2009, 2009 IEEE Symposium on Computational Intelligence in Cyber Security.

[13]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[14]  Hiroki Takakura,et al.  Toward a more practical unsupervised anomaly detection system , 2013, Inf. Sci..

[15]  Ghassan Beydoun,et al.  Generic modelling of security awareness in agent based systems , 2013, Inf. Sci..

[16]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[17]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[18]  Nen-Fu Huang,et al.  Application traffic classification at the early stage by characterizing application rounds , 2013, Inf. Sci..

[19]  Francesco Palmieri,et al.  A distributed approach to network anomaly detection based on independent component analysis , 2014, Concurr. Comput. Pract. Exp..

[20]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[21]  Youngseok Lee,et al.  Toward scalable internet traffic measurement and analysis with Hadoop , 2013, CCRV.

[22]  Felix C. Freiling,et al.  Measurements and Mitigation of Peer-to-Peer-based Botnets: A Case Study on Storm Worm , 2008, LEET.

[23]  Ching-Chiang Yeh,et al.  Going-concern prediction using hybrid random forests and rough set approach , 2014, Inf. Sci..

[24]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[25]  Prateek Mittal,et al.  BotGrep: Finding P2P Bots with Structured Graph Analysis , 2010, USENIX Security Symposium.

[26]  R. Schoof,et al.  Detecting peer-to-peer botnets , 2007 .

[27]  Youngseok Lee,et al.  A Hadoop-Based Packet Trace Processing Tool , 2011, TMA.

[28]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[29]  Javier Pérez-Rodríguez,et al.  A scalable approach to simultaneous evolutionary instance and feature selection , 2013, Inf. Sci..

[30]  Sven Dietrich,et al.  P2P as botnet command and control: A deeper insight , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[31]  Alfredo De Santis,et al.  Network anomaly detection with the restricted Boltzmann machine , 2013, Neurocomputing.

[32]  Dan Liu,et al.  A P2P-Botnet detection model and algorithms based on network streams analysis , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[33]  Brent Byunghoon Kang,et al.  Peer-to-Peer Botnets: Overview and Case Study , 2007, HotBots.

[34]  Fabio Roli,et al.  Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues , 2013, Inf. Sci..

[35]  Sven Dietrich,et al.  Analysis of the Storm and Nugache Trojans: P2P Is Here , 2007, login Usenix Mag..

[36]  Christian Benvenuti Understanding Linux Network Internals , 2005 .

[37]  Tamás Skopkó Loss Analysis of the Software-based Packet Capturing , 2012 .

[38]  Zahid Anwar,et al.  Semantic security against web application attacks , 2014, Inf. Sci..