ShamFinder: An Automated Framework for Detecting IDN Homographs

The internationalized domain name (IDN) is a mechanism that enables us to use Unicode characters in domain names. The set of Unicode characters contains several pairs of characters that are visually identical with each other; e.g., the Latin character 'a' (U+0061) and Cyrillic character 'a' (U+0430). Visually identical characters such as these are generally known as homoglyphs. IDN homograph attacks, which are widely known, abuse Unicode homoglyphs to create lookalike URLs. Although the threat posed by IDN homograph attacks is not new, the recent rise of IDN adoption in both domain name registries and web browsers has resulted in the threat of these attacks becoming increasingly widespread, leading to large-scale phishing attacks such as those targeting cryptocurrency exchange companies. In this work, we developed a framework named "ShamFinder," which is an automated scheme to detect IDN homographs. Our key contribution is the automatic construction of a homoglyph database, which can be used for direct countermeasures against the attack and to inform users about the context of an IDN homograph. Using the ShamFinder framework, we perform a large-scale measurement study that aims to understand the IDN homographs that exist in the wild. On the basis of our approach, we provide insights into an effective countermeasure against the threats caused by the IDN homograph attack.

[1]  Adam M. Costello Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA) , 2003, RFC.

[2]  Patrik Faltstrom The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) , 2010 .

[3]  Steven D. Gribble,et al.  Cutting through the Confusion: A Measurement Study of Homograph Attacks , 2006, USENIX Annual Technical Conference, General Track.

[4]  Djemel Ziou,et al.  Image Quality Metrics: PSNR vs. SSIM , 2010, 2010 20th International Conference on Pattern Recognition.

[5]  Wouter Joosen,et al.  Funny Accents: Exploring Genuine Interest in Internationalized Domain Names , 2019, PAM.

[6]  Walter Rweyemamu,et al.  Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research , 2019, PAM.

[7]  Evgeniy Gabrilovich,et al.  The homograph attack , 2002, CACM.

[8]  Patrik Fältström The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) , 2010, RFC.

[9]  Mitsuaki Akiyama,et al.  DomainChroma: Building actionable threat intelligence from malicious domain names , 2018, Comput. Secur..

[10]  Wouter Joosen,et al.  Parking Sensors: Analyzing and Detecting Parked Domains , 2015, NDSS.

[11]  Chris Kanich,et al.  The Long "Taile" of Typosquatting Domain Names , 2014, USENIX Security Symposium.

[12]  Martin Dürst Internationalization of Domain Names , 1998 .

[13]  Ying Liu,et al.  A Reexamination of Internationalized Domain Names: The Good, the Bad and the Ugly , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[14]  Tobias Lauinger,et al.  It's Not what It Looks Like: Measuring Attacks and Defensive Registrations of Homograph Domains , 2019, 2019 IEEE Conference on Communications and Network Security (CNS).

[15]  Mitsuaki Akiyama,et al.  DomainScouter: Understanding the Risks of Deceptive IDNs , 2019, RAID.

[16]  Paul E. Hoffman,et al.  Internationalizing Domain Names in Applications (IDNA) , 2003, RFC.

[17]  Other Contributors Are Indicated Where They Contribute The Unicode Consortium , 2017 .

[18]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[19]  Mitsuaki Akiyama,et al.  Detection Method of Homograph Internationalized Domain Names with OCR , 2019, J. Inf. Process..

[20]  Gang Wang,et al.  Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild , 2018, Internet Measurement Conference.

[21]  Wouter Joosen,et al.  Seven Months' Worth of Mistakes: A Longitudinal Study of Typosquatting Abuse , 2015, NDSS.

[22]  Narseo Vallina-Rodriguez,et al.  A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists , 2018, Internet Measurement Conference.