Web Phishing Detection Based on Page Spatial Layout Similarity

Web phishing is becoming an increasingly severe security threat in the web domain. Effective and efficient phishing detection is very important for protecting web users from loss of sensitive private information and even personal properties. One of the keys of phishing detection is to efficiently search the legitimate web page library and to find those page that are the most similar to a suspicious phishing page. Most existing phishing detection methods are focused on text and/or image features and have paid very limited attention to spatial layout characteristics of web pages. In this paper, we propose a novel phishing detection method that makes use of the informative spatial layout characteristics of web pages. In particular, we develop two different options to extract the spatial layout features as rectangle blocks from a given web page. Given two web pages, with their respective spatial layout features, we propose a page similarity definition that takes into account their spatial layout characteristics. Furthermore, we build an R-tree to index all the spatial layout features of a legitimate page library. As a result, phishing detection based on the spatial layout feature similarity is facilitated by relevant spatial queries via the R-tree. A series of simulation experiments are conducted to evaluate our proposals. The results demonstrate that the proposed novel phishing detection method is effective and efficient.

[1]  Ziv Bar-Yossef,et al.  Estimating the impressionrank of web pages , 2009, WWW '09.

[2]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[3]  Lorrie Faith Cranor,et al.  Teaching Johnny not to fall for phish , 2010, TOIT.

[4]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[5]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[6]  Wenzhong Shi,et al.  A probability-based multi-measure feature matching method in map conflation , 2009 .

[7]  Xiaotie Deng,et al.  An antiphishing strategy based on visual similarity assessment , 2006, IEEE Internet Computing.

[8]  Tyler Moore,et al.  Evil Searching: Compromise and Recompromise of Internet Hosts for Phishing , 2009, Financial Cryptography.

[9]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[10]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[11]  Gerhard Paass,et al.  Improved Phishing Detection using Model-Based Features , 2008, CEAS.

[12]  Christopher Krügel,et al.  A layout-similarity-based approach for detecting phishing pages , 2007, 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007.

[13]  Zhang Wei A Method of Detecting Phishing Web Pages Based on Hungarian Matching Algorithm , 2010 .

[14]  Xiaotie Deng,et al.  Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD) , 2006, IEEE Transactions on Dependable and Secure Computing.

[15]  Xiaotie Deng,et al.  Detection of phishing webpages based on visual similarity , 2005, WWW '05.

[16]  Marie-Francine Moens,et al.  Detecting Known and New Salting Tricks in Unwanted Emails , 2008, CEAS.

[17]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[18]  Atsushi Masuyama Methods for detecting apparent differences between spatial tessellations at different time points , 2006, Int. J. Geogr. Inf. Sci..

[19]  Cao Jiu A Phishing Web Pages Detection Algorithm Based on Nested Structure of Earth Mover's Distance(Nested-EMD) , 2009 .

[20]  Eric Medvet,et al.  Visual-similarity-based phishing detection , 2008, SecureComm.