Algorithm of web page similarity comparison based on visual block

Phishing often deceives users due to the relative similarity to the true pages on a layout and leads to considerable losses for the society. Consequently, detecting phishing sites has been an urgent activity. By researching phishing web pages using web page screenshots, we discover that this kind of web pages use numerous web page screenshots to achieve the close similarity to the true page and avoid the text and structure similarity detection. This study introduces a new similarity matching algorithm based on visual blocks. First, the RenderLayer tree of the web page is obtained to extract the visual block. Second, an algorithm that will settle the jumbled visual blocks, including the deletion of the small visual blocks and the emergence of the overlapping visual blocks, is designed. Finally, the similarity between the two web pages is assessed. The proposed algorithm sets different thresholds to achieve the optimal missing and false alarm rates.

[1]  Xiaotie Deng,et al.  Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD) , 2006, IEEE Transactions on Dependable and Secure Computing.

[2]  Karin Becker,et al.  Clustering Web Sessions by Levels of Page Similarity , 2006, PAKDD.

[3]  Isabel F. Cruz,et al.  Measuring Structural Similarity Among Web Documents: Preliminary Results , 1998, EP.

[4]  Justinus Andjarwirawan,et al.  Web Page Similarity Searching Based on Web Content , 2012 .

[5]  Kang-Leng Chiew,et al.  Utilisation of website logo for phishing detection , 2015, Comput. Secur..

[6]  K. S. Kuppusamy,et al.  PhiDMA - A phishing detection model with multi-filter approach , 2017, J. King Saud Univ. Comput. Inf. Sci..

[7]  Babak Anari,et al.  Determining the Similarity of Web Pages based on Learning Automata and Probabilistic Grammar , 2015 .

[8]  Ankit Kumar Jain,et al.  Phishing Detection: Analysis of Visual Similarity Based Approaches , 2017, Secur. Commun. Networks.

[9]  Ankit Kumar Jain,et al.  Mobile phishing attacks and defence mechanisms: State of art and open research challenges , 2017, Comput. Secur..

[10]  Xiaotie Deng,et al.  Detection of phishing webpages based on visual similarity , 2005, WWW '05.

[11]  María Alpuente,et al.  A Tool for Computing the Visual Similarity of Web Pages , 2010, 2010 10th IEEE/IPSJ International Symposium on Applications and the Internet.

[12]  Hassan Artail,et al.  A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations , 2008, Data Knowl. Eng..

[13]  Pradeep K. Atrey,et al.  A survey and classification of web phishing detection schemes , 2016, Secur. Commun. Networks.

[14]  Samuel Marchal,et al.  Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application , 2017, IEEE Transactions on Computers.

[15]  S. Roopak,et al.  A Novel Phishing Page Detection Mechanism Using HTML Source Code Comparison and Cosine Similarity , 2014, 2014 Fourth International Conference on Advances in Computing and Communications.

[16]  Bin Jiang,et al.  The Research of the Maximum Length n-grams Priority Chinese Word Segmentation Method Based on Corpus Type Frequency Information , 2012, ITCS 2012.

[17]  Tengke Xiong,et al.  An Intelligent Anti-phishing Strategy Model for Phishing Website Detection , 2012, 2012 32nd International Conference on Distributed Computing Systems Workshops.

[18]  Alwyn Roshan Pais,et al.  Detecting Phishing Websites using Automation of Human Behavior , 2017, CPSS@AsiaCCS.

[19]  Sachindra Joshi,et al.  A bag of paths model for measuring structural similarity in Web documents , 2003, KDD '03.

[20]  Pradeep K. Atrey,et al.  A phish detector using lightweight search features , 2016, Comput. Secur..

[21]  Xiaotie Deng,et al.  An antiphishing strategy based on visual similarity assessment , 2006, IEEE Internet Computing.