Ensemble-based Feature Selection and Classification Model for DNS Typo-squatting Detection

Domain Name System (DNS) plays in important role in the current IP-based Internet architecture. This is because it performs the domain name to IP resolution. However, the DNS protocol has several security vulnerabilities due to the lack of data integrity and origin authentication within it. This paper focuses on one particular security vulnerability, namely typo-squatting. Typo-squatting refers to the registration of a domain name that is extremely similar to that of an existing popular brand with the goal of redirecting users to malicious/suspicious websites. The danger of typo-squatting is that it can lead to information threat, corporate secret leakage, and can facilitate fraud. This paper builds on our previous work in [1], which only proposed majority-voting based classifier, by proposing an ensemble-based feature selection and bagging classification model to detect D NS typo-squatting attack. Experimental results show that the proposed framework achieves high accuracy and precision in identifying the malicious/suspicious typo-squatting domains (a loss of at most 1.5% in accuracy and 5% in precision when compared to the model that used the complete feature set) while having a lower computational complexity due to the smaller feature set (a reduction of more than 50 % in feature set size).

[1]  Gulshan Kumar,et al.  Feature Selection Approach for Intrusion Detection System , 2013 .

[2]  Rinkle Rani,et al.  Classification of Cancerous Profiles Using Machine Learning , 2017, 2017 International Conference on Machine Learning and Data Science (MLDS).

[3]  Seemab Latif,et al.  Handling intrusion and DDoS attacks in Software Defined Networks using machine learning techniques , 2014, 2014 National Software Engineering Conference.

[4]  B. Bonev Feature Selection based on Information Theory , 2010 .

[5]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[6]  Abdallah Shami,et al.  Dynamic SON-Enabled Location Management in LTE Networks , 2018, IEEE Transactions on Mobile Computing.

[7]  Suphannee Sivakorn,et al.  Countering Malicious Processes with Process-DNS Association , 2019, NDSS.

[8]  Anestis Karasaridis DNS Security , 2013 .

[9]  Ersin Namli,et al.  A comparative assessment of bagging ensemble models for modeling concrete slump flow , 2015 .

[10]  Abdallah Shami,et al.  On the Security of SDN: A Completed Secure and Scalable Framework Using the Software-Defined Perimeter , 2019, IEEE Access.

[11]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[12]  Georgios Mantas,et al.  An OAuth2-based protocol with strong user privacy preservation for smart city mobile e-Health apps , 2016, 2016 IEEE International Conference on Communications (ICC).

[13]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[14]  Abdallah Shami,et al.  Software-Defined Perimeter (SDP): State of the Art Secure Solution for Modern Networks , 2019, IEEE Network.

[15]  Kensuke Fukuda,et al.  Detecting Malicious Activity With DNS Backscatter Over Time , 2017, IEEE/ACM Transactions on Networking.

[16]  Shishir K. Shandilya,et al.  Dynamic Recognition of Phishing URLs Using Deep Learning Techniques , 2020 .

[17]  Ting Yu,et al.  A Survey on Malicious Domains Detection through DNS Data Analysis , 2018, ACM Comput. Surv..

[18]  Geoffrey Holmes,et al.  Feature selection via the discovery of simple classification rules , 1995 .

[19]  Mehmet S. Aktaş,et al.  Data Feature Selection Methods on Distributed Big Data Processing Platforms , 2018, 2018 3rd International Conference on Computer Science and Engineering (UBMK).

[20]  Abdelkader H. Ouda,et al.  Resource allocation in a network-based cloud computing environment: design challenges , 2013, IEEE Communications Magazine.

[21]  Abdallah Shami,et al.  Performance Analysis of SDP For Secure Internal Enterprises , 2019, 2019 IEEE Wireless Communications and Networking Conference (WCNC).

[22]  Abdallah Shami,et al.  Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[23]  M. Aramudhan,et al.  Feature selection based on information theory for pattern classification , 2014, 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT).

[24]  Chris J. Mitchell,et al.  Security vulnerabilities in DNS and DNSSEC , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[25]  Haleh Amintoosi,et al.  DNS Tunneling Detection Method Based on Multilabel Support Vector Machine , 2018, Secur. Commun. Networks.

[26]  Abdallah Shami,et al.  Power-Aware Optimized RRH to BBU Allocation in C-RAN , 2018, IEEE Transactions on Wireless Communications.

[27]  Yuchen Zhou,et al.  Unsupervised Clustering for Identification of Malicious Domain Campaigns , 2018 .

[28]  Scott Rose,et al.  DNS Security Introduction and Requirements , 2005, RFC.

[29]  Reza Curtmola,et al.  On the Performance and Analysis of DNS Security Extensions , 2005, CANS.

[30]  Said El Kafhali,et al.  DDoS attack detection using machine learning techniques in cloud computing environments , 2017, 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech).

[31]  Abdallah Shami,et al.  Tree-Based Intelligent Intrusion Detection System in Internet of Vehicles , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[32]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[33]  Hanan Lutfiyya,et al.  DNS Typo-Squatting Domain Detection: A Data Analytics & Machine Learning Based Approach , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[34]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .