Automatically determining phishing campaigns using the USCAP methodology

Phishing fraudsters attempt to create an environment which looks and feels like a legitimate institution, while at the same time attempting to bypass filters and suspicions of their targets. This is a difficult compromise for the phishers and presents a weakness in the process of conducting this fraud. In this research, a methodology is presented that looks at the differences that occur between phishing websites from an authorship analysis perspective and is able to determine different phishing campaigns undertaken by phishing groups. The methodology is named USCAP, for Unsupervised SCAP, which builds on the SCAP methodology from supervised authorship and extends it for unsupervised learning problems. The phishing website source code is examined to generate a model that gives the size and scope of each of the recognized phishing campaigns. The USCAP methodology introduces the first time that phishing websites have been clustered by campaign in an automatic and reliable way, compared to previous methods which relied on costly expert analysis of phishing websites. Evaluation of these clusters indicates that each cluster is strongly consistent with a high stability and reliability when analyzed using new information about the attacks, such as the dates that the attack occurred on. The clusters found are indicative of different phishing campaigns, presenting a step towards an automated phishing authorship analysis methodology.

[1]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  Curtis R. Cook,et al.  Programming style authorship analysis , 1989, CSC '89.

[5]  Stephen G. MacDonell,et al.  A Fuzzy Logic Approach to Computer Software Source Code Authorship Analysis , 1997, ICONIP.

[6]  Eugene H. Spafford,et al.  Authorship analysis: identifying the author of a program , 1997, Comput. Secur..

[7]  Stephen G. MacDonell,et al.  IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination): a dictionary-based system for extracting source code metrics for software forensics , 1998, Proceedings. 1998 International Conference Software Engineering: Education and Practice (Cat. No.98EX220).

[8]  Stephen G. MacDonell,et al.  Software forensics for discriminating between program authors using case-based reasoning, feedforward neural networks and multiple discriminant analysis , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[9]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[10]  Stephen G. MacDonell,et al.  Software Forensics for Discriminating between Program Authors using Case-Based Reasoning, Feed-Forward Neural Networks and Multiple , 1999 .

[11]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[13]  Hsinchun Chen,et al.  Applying authorship analysis to extremist-group Web forum messages , 2005, IEEE Intelligent Systems.

[14]  Stefanos Gritzalis,et al.  Source Code Author Identification Based on N-gram Author Profiles , 2006, AIAI.

[15]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[16]  Stefanos Gritzalis,et al.  Effective identification of source code authors using byte-level information , 2006, ICSE.

[17]  Team Cymru,et al.  The Underground Economy: Priceless , 2006, login Usenix Mag..

[18]  Thomas Lavergne,et al.  Tracking Web Spam with Hidden Style Similarity , 2006, AIRWeb.

[19]  Mario Vento,et al.  A Graph-Based Clustering Method and Its Applications , 2007, BVAI.

[20]  Stefanos Gritzalis,et al.  Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method , 2007, Int. J. Digit. EVid..

[21]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[22]  Tyler Moore,et al.  Examining the impact of website take-down on phishing , 2007, eCrime '07.

[23]  Andrew H. Sung,et al.  Detection of Phishing Attacks: A Machine Learning Approach , 2008, Soft Computing Applications in Industry.

[24]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[25]  Thomas Lavergne,et al.  Tracking Web spam with HTML style similarities , 2008, TWEB.

[26]  Sung Hoon Kim,et al.  Method for Evaluating the Security Risk of a Website Against Phishing Attacks , 2008, ISI Workshops.

[27]  Alex Ng,et al.  Forensic Characteristics of Phishing - Petty Theft or Organized Crime? , 2008, WEBIST.

[28]  Christopher Krügel,et al.  There Is No Free Phish: An Analysis of "Free" and Live Phishing Kits , 2008, WOOT.

[29]  Simon Brown,et al.  Detecting Phishing Emails Using Hybrid Features , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[30]  Paul A. Watters,et al.  Determining provenance in phishing websites using automated conceptual analysis , 2009, 2009 eCrime Researchers Summit.

[31]  Ying Li,et al.  A Cybercrime Forensic Method for Chinese Web Information Authorship Analysis , 2009, PAISI.

[32]  Fergus Toolan,et al.  Phishing detection using classifier ensembles , 2009, 2009 eCrime Researchers Summit.

[33]  Simon Brown,et al.  Using Differencing to Increase Distinctiveness for Phishing Website Clustering , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[34]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[35]  Simon Brown,et al.  Automatically Generating Classifier for Phishing Email Prediction , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.