The methodology and an application to fight against Unicode attacks

Unicode is becoming a dominant character representation format for information processing. This presents a very dangerous usability and security problem for many applications. The problem arises because many characters in the UCS (Universal Character Set) are visually and/or semantically similar to each other. This presents a mechanism for malicious people to carry out Unicode Attacks, which include spam attacks, phishing attacks, and web identity attacks. In this paper, we address the potential attacks, and propose a methodology for countering them. To evaluate the feasibility of our methodology, we construct a Unicode Character Similarity List (UC-SimList). We then implement a visual and semantic based edit distance (VSED), as well as a visual and semantic based Knuth-Morris-Pratt algorithm (VSKMP), to detect Unicode attacks. We develop a prototype Unicode attack detection tool, IDN-SecuChecker, which detects phishing weblinks and fake user name (account) attacks. We also introduce the possible practical use of Unicode attack detectors.

[1]  Rob Miller,et al.  Johnny 2: a user test of key continuity management with S/MIME and Outlook Express , 2005, SOUPS '05.

[2]  Robert Richards,et al.  Document Object Model (DOM) , 2006 .

[3]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[4]  Ophir Frieder,et al.  Collection statistics for fast duplicate document detection , 2002, TOIS.

[5]  Min Wu,et al.  Do security toolbars actually prevent phishing attacks? , 2006, CHI.

[6]  Saul Gorn,et al.  American standard code for information interchange , 1963, CACM.

[7]  Xiaotie Deng,et al.  EMD based Visual Similarity for Detection of Phishing Webpages , 2005 .

[8]  Joe Marini,et al.  Document Object Model , 2002, Encyclopedia of GIS.

[9]  J. Doug Tygar,et al.  The battle against phishing: Dynamic Security Skins , 2005, SOUPS '05.

[10]  Roy T. Fielding,et al.  Uniform Resource Identifier (URI): Generic Syntax , 2005, RFC.

[11]  Evgeniy Gabrilovich,et al.  The homograph attack , 2002, CACM.

[12]  Marti A. Hearst,et al.  Why phishing works , 2006, CHI.

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[15]  Xiaotie Deng,et al.  An antiphishing strategy based on visual similarity assessment , 2006, IEEE Internet Computing.

[16]  Markus Jakobsson,et al.  Modeling and Preventing Phishing Attacks , 2005, Financial Cryptography.

[17]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[18]  Justin Zobel,et al.  Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[19]  Ellen M. Voorhees,et al.  The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text , 2000, Information Retrieval.

[20]  Martin J. Dürst,et al.  Internationalized Resource Identifiers (IRIs) , 2005, RFC.

[21]  Manabu OKUMURA,et al.  Structuring Web pages based on Repetition of Elements , 2003 .

[22]  Xiaotie Deng,et al.  A Potential IRI Based Phishing Strategy , 2005, WISE.