Regap: a Tool for Unicode-based Web Identity Fraud Detection Regap: a Tool for Unicode-based Web Identity Fraud Detection De

ABSTRACT We anticipate the widespread usage of an internationalized resource identifier (IRI) 1 or internationalized domain name (IDN) 2 on the web as complement to universal resource identifier (URI). IRI/IDN is composed of characters in a subset of Unicode, such that a Unicode attack 3 to IRI/IDN could happen. Hence, visually or semantically, certain phishing IRI/IDNs may show high similarity to the real ones. The potential phishing attacks based on this strategy are very likely to happen in the near future with the boosting utilization of IRI/IDN. We invented a method to detect such phishing attack. We constructed a unicode character similarity list (UC-SimList) based on char-char visual and semantic similarities and use a nondeterministic finite automaton (NFA) 4 to identify the potential IRI/IDN-based phishing patterns. We implemented a phishing IRI/IDN pattern generation tool, REGAP, by which phishing IRI/IDN patterns can be generated into regular expressions (RE) for phishing IRI/IDN detection. We ...