A Big Data Architecture for Security Data and Its Application to Phishing Characterization

As the Internet grows, cybersecurity problems also arise. Different types of malicious activities have been explored by attackers. However, the existent defense mechanisms are not able to completely end the malicious threats, perpetuating this continuous arms race. The development of applications to mitigate those threats presents some complicating factors such as the growth in the amount of data, and the variety of data, that can come from different sources. In this paper we present an architecture built on top of Big Data frameworks that aims to mitigate cybersecurity problems such as spam and phishing and we show how it is being used to study spam and phishing collected using a global honeynet.

[1]  Wagner Meira,et al.  A Hadoop Extension to Process Mail Folders and its Application to a Spam Dataset , 2014, 2014 International Symposium on Computer Architecture and High Performance Computing Workshop.

[2]  Maozhen Li,et al.  SpamCloud: A MapReduce based anti-spam architecture , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[3]  Alvaro A. Cárdenas,et al.  Big Data Analytics for Security , 2013, IEEE Security & Privacy.

[4]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[5]  Vishal Kumar,et al.  Identification and Detection of Phishing Emails Using Natural Language Processing Techniques , 2014, SIN.

[6]  Youki Kadobayashi,et al.  MATATABI: Multi-layer Threat Analysis Platform with Hadoop , 2014, 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS).

[7]  Radu State,et al.  BotCloud: Detecting botnets using MapReduce , 2011, 2011 IEEE International Workshop on Information Forensics and Security.

[8]  Radu State,et al.  PhishStorm: Detecting Phishing With Streaming Analytics , 2014, IEEE Transactions on Network and Service Management.

[9]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[10]  Shuai Ding,et al.  LARX: Large-Scale Anti-Phishing by Retrospective Data-Exploring Based on a Cloud Computing Platform , 2011, 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN).

[11]  Yi Mu,et al.  Recent Advances in Security and Privacy in Big Data , 2015, J. Univers. Comput. Sci..

[12]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.