DomainScouter: Analyzing the Risks of Deceptive Internationalized Domain Names

Internationalized domain names (IDNs) are abused to create domain names that are visually similar to those of legitimate/popular brands. In this work, we systematize such domain names, which we call deceptive IDNs, and analyze the risks associated with them. In particular, we propose a new system called DomainScouter to detect various deceptive IDNs and calculate a deceptive IDN score, a new metric indicating the number of users that are likely to be misled by a deceptive IDN. We perform a comprehensive measurement study on the identified deceptive IDNs using over 4.4 million registered IDNs under 570 top-level domains (TLDs). The measurement results demonstrate that there are many previously unexplored deceptive IDNs targeting non-English brands or combining other domain squatting methods. Furthermore, we conduct online surveys to examine and highlight vulnerabilities in user perceptions when encountering such IDNs. Finally, we discuss the practical countermeasures that stakeholders can take against deceptive IDNs. key words: internationalized domain name (IDN), deceptive IDN, measurement, user study

[1]  Narseo Vallina-Rodriguez,et al.  A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists , 2018, Internet Measurement Conference.

[2]  Mitsuaki Akiyama,et al.  Detecting Homograph IDNs Using OCR , 2018 .

[3]  Steven D. Gribble,et al.  Cutting through the Confusion: A Measurement Study of Homograph Attacks , 2006, USENIX Annual Technical Conference, General Track.

[4]  Tina Dam Internationalized Domain Names , 2008 .

[5]  Sandeep Yadav,et al.  Detecting algorithmically generated malicious domain names , 2010, IMC '10.

[6]  Ying Liu,et al.  A Reexamination of Internationalized Domain Names: The Good, the Bad and the Ugly , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[7]  K. Ball How old are you? , 1997, Today's surgical nurse.

[8]  Michel van Eeten,et al.  An Empirical Analysis of ZeuS C&C Lifetime , 2015, AsiaCCS.

[9]  Wouter Joosen,et al.  Funny Accents: Exploring Genuine Interest in Internationalized Domain Names , 2019, PAM.

[10]  Vinton G. Cerf Polyglot! , 2019, Commun. ACM.

[11]  C. Rossow,et al.  Paint It Black: Evaluating the Effectiveness of Malware Blacklists , 2014, RAID.

[12]  Wouter Joosen,et al.  Bitsquatting: exploiting bit-flips for fun, or profit? , 2013, WWW.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Tran Thao Phuong,et al.  Hunting Brand Domain Forgery: A Scalable Classification for Homograph Attack , 2019, SEC.

[15]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[16]  Emily Stark The URLephant in the Room , 2019 .

[17]  Wouter Joosen,et al.  Seven Months' Worth of Mistakes: A Longitudinal Study of Typosquatting Abuse , 2015, NDSS.

[18]  Katsunari Yoshioka,et al.  Who Gets the Boot? Analyzing Victimization by DDoS-as-a-Service , 2016, RAID.

[19]  Mitsuaki Akiyama,et al.  DomainProfiler: Discovering Domain Names Abused in Future , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[20]  Chris Kanich,et al.  Every Second Counts: Quantifying the Negative Externalities of Cybercrime via Typosquatting , 2015, 2015 IEEE Symposium on Security and Privacy.

[21]  Kim Davies,et al.  Representing Label Generation Rulesets Using XML , 2016, RFC.

[22]  Tobias Lauinger,et al.  WHOIS Lost in Translation: (Mis)Understanding Domain Name Expiration and Re-Registration , 2016, Internet Measurement Conference.

[23]  Mitsuaki Akiyama,et al.  DomainChroma: Building actionable threat intelligence from malicious domain names , 2018, Comput. Secur..

[24]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[25]  Nikolaos Pitropakis,et al.  Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse , 2017, CCS.

[26]  Arno Fiedler,et al.  Certificate transparency , 2014, Commun. ACM.

[27]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[28]  Evgeniy Gabrilovich,et al.  The homograph attack , 2002, CACM.

[29]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[30]  Mitsuaki Akiyama,et al.  DomainScouter: Understanding the Risks of Deceptive IDNs , 2019, RAID.

[31]  Chris Kanich,et al.  The Long "Taile" of Typosquatting Domain Names , 2014, USENIX Security Symposium.

[32]  Shigeki Goto,et al.  ShamFinder: An Automated Framework for Detecting IDN Homographs , 2019, Internet Measurement Conference.

[33]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[34]  Yi-Min Wang,et al.  Strider Typo-Patrol: Discovery and Analysis of Systematic Typo-Squatting , 2006, SRUTI.

[35]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[36]  Hardijan Rusli,et al.  Uniform Domain Name Dispute Resolution Policy: What Is It? , 2013 .

[37]  Wouter Joosen,et al.  Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation , 2018, NDSS.

[38]  Giovane C. M. Moura,et al.  Cybercrime After the Sunrise: A Statistical Analysis of DNS Abuse in New gTLDs , 2018, AsiaCCS.

[39]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.