Accurate spear phishing campaign attribution and early detection

There is growing evidence that spear phishing campaigns are increasingly pervasive, sophisticated, and remain the starting points of more advanced attacks. Current campaign identification and attribution process heavily relies on manual efforts and is inefficient in gathering intelligence in a timely manner. It is ideal that we can automatically attribute spear phishing emails to known campaigns and achieve early detection of new campaigns using limited labelled emails as the seeds. In this paper, we introduce four categories of email profiling features that capture various characteristics of spear phishing emails. Building on these features, we implement and evaluate an affinity graph based semi-supervised learning model for campaign attribution and detection. We demonstrate that our system, using only 25 labelled emails, achieves 0.9 F1 score with a 0.01 false positive rate in known campaign attribution, and is able to detect previously unknown spear phishing campaigns, achieving 100% 'darkmoon', over 97% of 'samkams' and 91% of 'bisrala' campaign detection using 246 labelled emails in our experiments.

[1]  Susan T. Dumais,et al.  Improving information retrieval using latent semantic indexing , 1988 .

[2]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[3]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[4]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[5]  Christopher Abad,et al.  The economy of phishing: A survey of the operations of the phishing market , 2005, First Monday.

[6]  Markus Jakobsson,et al.  Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft , 2006 .

[7]  Santosh S. Vempala,et al.  Filtering spam with behavioral blacklisting , 2007, CCS '07.

[8]  Alexander G. Gray,et al.  Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine , 2009, USENIX Security Symposium.

[9]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[10]  Chris Kanich,et al.  Botnet Judo: Fighting Spam with Itself , 2010, NDSS.

[11]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[12]  Jason Hong,et al.  The state of phishing attacks , 2012, Commun. ACM.

[13]  Rui Chen,et al.  Research Article Phishing Susceptibility: An Investigation Into the Processing of a Targeted Spear Phishing Email , 2012, IEEE Transactions on Professional Communication.

[14]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[15]  Leyla Bilge,et al.  Industrial Espionage and Targeted Attacks: Understanding the Characteristics of an Escalating Threat , 2012, RAID.

[16]  Rachel Greenstadt,et al.  Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[17]  Caroline Petitjean,et al.  One class random forests , 2013, Pattern Recognit..

[18]  Robert E. Mercer,et al.  Classifying Spam Emails Using Text and Readability Features , 2013, 2013 IEEE 13th International Conference on Data Mining.

[19]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[20]  Shari Lawrence Pfleeger,et al.  Going Spear Phishing: Exploring Embedded Training and Awareness , 2014, IEEE Security & Privacy.

[21]  Gianluca Stringhini,et al.  The Tricks of the Trade: What Makes Spam Campaigns Successful? , 2014, 2014 IEEE Security and Privacy Workshops.

[22]  Mark Allman,et al.  A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise , 2014, Comput. Networks.

[23]  Gianluca Stringhini,et al.  That Ain't You: Blocking Spearphishing Through Behavioral Modelling , 2015, DIMVA.

[24]  Adel Bouhoula,et al.  Behavior-based approach to detect spam over IP telephony attacks , 2015, International Journal of Information Security.

[25]  Yada Zhu,et al.  Social Phishing , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..