A Word-Level Analytical Approach for Identifying Malicious Domain Names Caused by Dictionary-Based DGA Malware

Computer networks are facing serious threats from the emergence of malware with sophisticated DGAs (Domain Generation Algorithms). This type of DGA malware dynamically generates domain names by concatenating words from dictionaries for evading detection. In this paper, we propose an approach for identifying the callback communications of such dictionary-based DGA malware by analyzing their domain names at the word level. This approach is based on the following observations: These malware families use their own dictionaries and algorithms to generate domain names, and accordingly, the word usages of malware-generated domains are distinctly different from those of human-generated domains. Our evaluation indicates that the proposed approach is capable of achieving accuracy, recall, and precision as high as 0.9989, 0.9977, and 0.9869, respectively, when used with labeled datasets. We also clarify the functional differences between our approach and other published methods via qualitative comparisons. Taken together, these results suggest that malware-infected machines can be identified and removed from networks using DNS queries for detected malicious domain names as triggers. Our approach contributes to dramatically improving network security by providing a technique to address various types of malware encroachment.

[1]  Steven C. H. Hoi,et al.  Malicious URL Detection using Machine Learning: A Survey , 2017, ArXiv.

[2]  Dinil Mon Divakaran,et al.  A Survey of Privacy-Preserving Techniques for Encrypted Traffic Inspection over Network Middleboxes , 2021, ArXiv.

[3]  Guang Cheng,et al.  Detecting domain-flux botnet based on DNS traffic features in managed network , 2016, Secur. Commun. Networks.

[4]  A. Selcuk Uluagac,et al.  A Survey on Ransomware: Evolution, Taxonomy, and Defense Solutions , 2021, ArXiv.

[5]  Stefano Zanero,et al.  SysTaint: Assisting Reversing of Malicious Network Communications , 2018, SSPREW-8.

[6]  Vinod Yegneswaran,et al.  BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation , 2007, USENIX Security Symposium.

[7]  Emiliano De Cristofaro,et al.  Controlled Data Sharing for Collaborative Predictive Blacklisting , 2015, DIMVA.

[8]  Adam M. Costello Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA) , 2003, RFC.

[9]  Orit Halpern Beautiful Data , 2015 .

[10]  Sherali Zeadally,et al.  A Taxonomy of Domain-Generation Algorithms , 2016, IEEE Security & Privacy.

[11]  Takeshi Ikenaga,et al.  Estimating the Randomness of Domain Names for DGA Bot Callbacks , 2018, IEEE Communications Letters.

[12]  Gen Kitagata,et al.  A Cause-Based Classification Approach for Malicious DNS Queries Detected Through Blacklists , 2019, IEEE Access.

[13]  Mehmet Demirci,et al.  SDN-based cyber defense: A survey , 2021, Future Gener. Comput. Syst..

[14]  LungChung-Horng,et al.  Threats to Online Advertising and Countermeasures , 2020 .

[15]  Stanislav Špaček,et al.  Current Issues of Malicious Domains Blocking , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[16]  Paul E. Hoffman,et al.  Specification for DNS over Transport Layer Security (TLS) , 2016, RFC.

[17]  K. P. Soman,et al.  Evaluating deep learning approaches to characterize and classify the DGAs at scale , 2018, J. Intell. Fuzzy Syst..

[18]  Xin Wang,et al.  A 60Gbps DPI Prototype based on Memory-Centric FPGA , 2016, SIGCOMM.

[19]  Nick Feamster,et al.  Global Measurement of DNS Manipulation , 2017, USENIX Security Symposium.

[20]  Dohoon Kim,et al.  Potential Risk Analysis Method for Malware Distribution Networks , 2019, IEEE Access.

[21]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[22]  Martine De Cock,et al.  Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic , 2018, RAID.

[23]  Hui-Tang Lin,et al.  DBod: Clustering and detecting DGA-based botnets using DNS traffic analysis , 2017, Comput. Secur..

[24]  D. Scott Endsley,et al.  The Deep Dive , 2010 .

[25]  Athina Markopoulou,et al.  Blacklisting Recommendation System: Using Spatio-Temporal Patterns to Predict Future Attacks , 2011, IEEE Journal on Selected Areas in Communications.

[26]  Yizheng Chen,et al.  Enabling Network Security Through Active DNS Datasets , 2016, RAID.

[27]  Jindong Chen,et al.  Deep Short Text Classification with Knowledge Powered Attention , 2019, AAAI.

[28]  Adel Said Elmaghraby,et al.  Malicious Text Identification: Deep Learning from Public Comments and Emails , 2020, Inf..

[29]  Ivan Zelinka,et al.  Artificial Intelligence in the Cyber Domain: Offense and Defense , 2020, Symmetry.

[30]  Herbert Bos,et al.  Highly resilient peer-to-peer botnets are here: An analysis of Gameover Zeus , 2013, 2013 8th International Conference on Malicious and Unwanted Software: "The Americas" (MALWARE).

[31]  Gregorio Martínez Pérez,et al.  UMUDGA: A dataset for profiling DGA-based botnet , 2020, Comput. Secur..

[32]  Xuemin Chen,et al.  A Discrete Hidden Markov Model for SMS Spam Detection , 2020, Applied Sciences.

[33]  Arun Kumar Sangaiah,et al.  DGA Domain Name Classification Method Based on Long Short-Term Memory with Attention Mechanism , 2019, Applied Sciences.

[34]  Ying Liu,et al.  A Reexamination of Internationalized Domain Names: The Good, the Bad and the Ugly , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[35]  Babak Rahbarinia,et al.  Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks , 2016, ACM Trans. Priv. Secur..

[36]  Joel Nothman,et al.  Stop Word Lists in Free Open-source Software Packages , 2018 .

[37]  Ting Yu,et al.  A Survey on Malicious Domains Detection through DNS Data Analysis , 2018, ACM Comput. Surv..

[38]  Thaksen J. Parvat,et al.  Performance improvement of deep packet inspection for Intrusion Detection , 2014, 2014 IEEE Global Conference on Wireless Computing & Networking (GCWCN).

[39]  Mitsuaki Akiyama,et al.  DomainScouter: Analyzing the Risks of Deceptive Internationalized Domain Names , 2020, IEICE Trans. Inf. Syst..

[40]  Wilfried N. Gansterer,et al.  Mining agile DNS traffic using graph analysis for cybercrime detection , 2016, Comput. Networks.

[41]  Hyrum S. Anderson,et al.  DeepDGA: Adversarially-Tuned Domain Generation and Detection , 2016, AISec@CCS.

[42]  Stanislav Špaček,et al.  DNS Firewall Data Visualization , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[43]  Jingxuan Sun,et al.  Stealthy Domain Generation Algorithms , 2017, IEEE Transactions on Information Forensics and Security.