DNS Typo-Squatting Domain Detection: A Data Analytics & Machine Learning Based Approach

Domain Name System (DNS) is a crucial component of current IP-based networks as it is the standard mechanism for name to IP resolution. However, due to its lack of data integrity and origin authentication processes, it is vulnerable to a variety of attacks. One such attack is Typosquatting. Detecting this attack is particularly important as it can be a threat to corporate secrets and can be used to steal information or commit fraud. In this paper, a machine learning-based approach is proposed to tackle the typosquatting vulnerability. To that end, exploratory data analytics is first used to better understand the trends observed in eight domain name-based extracted features. Furthermore, a majority voting-based ensemble learning classifier built using five classification algorithms is proposed that can detect suspicious domains with high accuracy. Moreover, the observed trends are validated by studying the same features in an unlabeled dataset using K-means clustering algorithm and through applying the developed ensemble learning classifier. Results show that legitimate domains have a smaller domain name length and fewer unique characters. Moreover, the developed ensemble learning classifier performs better in terms of accuracy, precision, and F-score. Furthermore, it is shown that similar trends are observed when clustering is used. However, the number of domains identified as potentially suspicious is high. Hence, the ensemble learning classifier is applied with results showing that the number of domains identified as potentially suspicious is reduced by almost a factor of five while still maintaining the same trends in terms of features' statistics.

[1]  Chris J. Mitchell,et al.  Security vulnerabilities in DNS and DNSSEC , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[2]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[3]  Reza Curtmola,et al.  On the Performance and Analysis of DNS Security Extensions , 2005, CANS.

[4]  Shravan Mantri,et al.  Distributed denial of service: Attacks and its effects , 2018 .

[5]  Rinkle Rani,et al.  Classification of Cancerous Profiles Using Machine Learning , 2017, 2017 International Conference on Machine Learning and Data Science (MLDS).

[6]  Paul V. Mockapetris,et al.  Development of the domain name system , 1988, SIGCOMM '88.

[7]  István Vajk,et al.  Frequent Pattern Mining in Web Log Data , 2006 .

[8]  Scott Rose,et al.  DNS Security Introduction and Requirements , 2005, RFC.

[9]  Abdelkader H. Ouda,et al.  Resource allocation in a network-based cloud computing environment: design challenges , 2013, IEEE Communications Magazine.

[10]  Seemab Latif,et al.  Handling intrusion and DDoS attacks in Software Defined Networks using machine learning techniques , 2014, 2014 National Software Engineering Conference.

[11]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[12]  Anestis Karasaridis DNS Security , 2013 .

[13]  Said El Kafhali,et al.  DDoS attack detection using machine learning techniques in cloud computing environments , 2017, 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech).