High-performance content-based phishing attack detection

Phishers continue to alter the source code of the web pages used in their attacks to mimic changes to legitimate websites of spoofed organizations and to avoid detection by phishing countermeasures. Manipulations can be as subtle as source code changes or as apparent as adding or removing significant content. To appropriately respond to these changes to phishing campaigns, a cadre of file matching algorithms is implemented to detect phishing websites based on their content, employing a custom data set consisting of 17,992 phishing attacks targeting 159 different brands. The results of the experiments using a variety of different content-based approaches demonstrate that some can achieve a detection rate of greater than 90% while maintaining a low false positive rate.

[1]  D. Hurlbut Fuzzy Hashing for Digital Forensic Investigators , 2009 .

[2]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[3]  Andrew H. Sung,et al.  Detection of Phishing Attacks: A Machine Learning Approach , 2008, Soft Computing Applications in Industry.

[4]  Xuhua Ding,et al.  Anomaly Based Web Phishing Page Detection , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[5]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[6]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[7]  Jason I. Hong,et al.  A hybrid phish detection approach by identity discovery and keywords retrieval , 2009, WWW '09.

[8]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[9]  J. W. Hunt,et al.  An Algorithm for Differential File Comparison , 2008 .

[10]  Richard Weber,et al.  Online phishing classification using adversarial data mining and signaling games , 2010, SKDD.

[11]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[12]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[13]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[14]  Mojtaba Vahidi-Asl,et al.  Learn to Detect Phishing Scams Using Learning and Ensemble ?Methods , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[15]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[16]  K. Saravanan,et al.  An integrated approach to detect phishing mail attacks: a case study , 2009, SIN '09.

[17]  Andrew H. Sung,et al.  Classifying Phishing Emails Using Confidence-Weighted Linear Classifiers , 2010 .

[18]  Markus Jakobsson,et al.  Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft , 2006 .

[19]  P. Lalitha,et al.  New Filtering Approaches for Phishing Email , 2013 .

[20]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[21]  Brad Wardman,et al.  The Deadliest Catch: Reeling In Big Phish With a Deep MD5 Net , 2010, J. Digit. Forensics Secur. Law.

[22]  Stephen Groat,et al.  GoldPhish: Using Images for Content-Based Phishing Analysis , 2010, 2010 Fifth International Conference on Internet Monitoring and Protection.

[23]  Chuanxiong Guo,et al.  Online Detection and Prevention of Phishing Attacks , 2006, 2006 First International Conference on Communications and Networking in China.

[24]  Gary Warner,et al.  Automating phishing website identification through deep MD5 matching , 2008, 2008 eCrime Researchers Summit.