A Superficial Analysis Approach for Identifying Malicious Domain Names Generated by DGA Malware

Some of the most serious security threats facing computer networks involve malware. To prevent malware-related damage, administrators must swiftly identify and remove the infected machines that may reside in their networks. However, many malware families have domain generation algorithms (DGAs) to avoid detection. A DGA is a technique in which the domain name is changed frequently to hide the callback communication from the infected machine to the command-and-control server. In this article, we propose an approach for estimating the randomness of domain names by superficially analyzing their character strings. This approach is based on the following observations: human-generated benign domain names tend to reflect the intent of their domain registrants, such as an organization, product, or content. In contrast, dynamically generated malicious domain names consist of meaningless character strings because conflicts with already registered domain names must be avoided; hence, there are discernible differences in the strings of dynamically generated and human-generated domain names. Notably, our approach does not require any prior knowledge about DGAs. Our evaluation indicates that the proposed approach is capable of achieving recall and precision as high as 0.9960 and 0.9029, respectively, when used with labeled datasets. Additionally, this approach has proven to be highly effective for datasets collected via a campus network. Thus, these results suggest that malware-infected machines can be swiftly identified and removed from networks using DNS queries for detected malicious domains as triggers.

[1]  Dohoon Kim,et al.  Potential Risk Analysis Method for Malware Distribution Networks , 2019, IEEE Access.

[2]  Chung-Horng Lung,et al.  Threats to Online Advertising and Countermeasures , 2020, Digital Threats: Research and Practice.

[3]  Thaksen J. Parvat,et al.  Performance improvement of deep packet inspection for Intrusion Detection , 2014, 2014 IEEE Global Conference on Wireless Computing & Networking (GCWCN).

[4]  Nick Feamster,et al.  Global Measurement of DNS Manipulation , 2017, USENIX Security Symposium.

[5]  Paul E. Hoffman,et al.  Specification for DNS over Transport Layer Security (TLS) , 2016, RFC.

[6]  Yizheng Chen,et al.  Enabling Network Security Through Active DNS Datasets , 2016, RAID.

[7]  Xin Wang,et al.  A 60Gbps DPI Prototype based on Memory-Centric FPGA , 2016, SIGCOMM.

[8]  Jingxuan Sun,et al.  Stealthy Domain Generation Algorithms , 2017, IEEE Transactions on Information Forensics and Security.

[9]  Gregorio Martínez Pérez,et al.  UMUDGA: A dataset for profiling DGA-based botnet , 2020, Comput. Secur..

[10]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[11]  Lorenzo Cavallaro,et al.  TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time , 2018, USENIX Security Symposium.

[12]  Hui-Tang Lin,et al.  DBod: Clustering and detecting DGA-based botnets using DNS traffic analysis , 2017, Comput. Secur..

[13]  Hyrum S. Anderson,et al.  DeepDGA: Adversarially-Tuned Domain Generation and Detection , 2016, AISec@CCS.

[14]  Nicholas R. Jennings,et al.  Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning , 2020, SN Computer Science.

[15]  Takeshi Ikenaga,et al.  Estimating the Randomness of Domain Names for DGA Bot Callbacks , 2018, IEEE Communications Letters.

[16]  Bin Yu,et al.  CharBot: A Simple and Effective Method for Evading DGA Classifiers , 2019, IEEE Access.

[17]  Stuart Cheshire,et al.  Dynamic Host Configuration Protocol (DHCP) Domain Search Option , 2002, RFC.

[18]  John R. Levine DNS Blacklists and Whitelists , 2010, RFC.

[19]  Arun Kumar Sangaiah,et al.  DGA Domain Name Classification Method Based on Long Short-Term Memory with Attention Mechanism , 2019, Applied Sciences.

[20]  Guangquan Zhang,et al.  Learning under Concept Drift: A Review , 2019, IEEE Transactions on Knowledge and Data Engineering.

[21]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Roberto Perdisci,et al.  ExecScent: Mining for New C&C Domains in Live Networks with Adaptive Control Protocol Templates , 2013, USENIX Security Symposium.

[23]  João Paulo Papa,et al.  An Overview on Concept Drift Learning , 2019, IEEE Access.

[24]  Tommy Chin,et al.  A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection , 2019, IEEE Access.

[25]  Wilfried N. Gansterer,et al.  Mining agile DNS traffic using graph analysis for cybercrime detection , 2016, Comput. Networks.

[26]  Stanislav Špaček,et al.  DNS Firewall Data Visualization , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[27]  Adam M. Costello Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA) , 2003, RFC.

[28]  Emiliano De Cristofaro,et al.  Controlled Data Sharing for Collaborative Predictive Blacklisting , 2015, DIMVA.

[29]  Sherali Zeadally,et al.  A Taxonomy of Domain-Generation Algorithms , 2016, IEEE Security & Privacy.

[30]  Steven Euijong Whang,et al.  A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective , 2018, IEEE Transactions on Knowledge and Data Engineering.

[31]  Stanislav Špaček,et al.  Current Issues of Malicious Domains Blocking , 2019, 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[32]  Babak Rahbarinia,et al.  Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks , 2016, ACM Trans. Priv. Secur..

[33]  Athina Markopoulou,et al.  Blacklisting Recommendation System: Using Spatio-Temporal Patterns to Predict Future Attacks , 2011, IEEE Journal on Selected Areas in Communications.

[34]  Yuan Zhang,et al.  Malware characteristics and threats on the internet ecosystem , 2012, J. Syst. Softw..

[35]  Martine De Cock,et al.  Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic , 2018, RAID.

[36]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[37]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[38]  Guang Cheng,et al.  Detecting domain-flux botnet based on DNS traffic features in managed network , 2016, Secur. Commun. Networks.

[39]  David Lee,et al.  Traceback Attacks in Cloud -- Pebbletrace Botnet , 2012, 2012 32nd International Conference on Distributed Computing Systems Workshops.

[40]  Leyla Bilge,et al.  Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains , 2014, TSEC.

[41]  Yongzheng Zhang,et al.  Khaos: An Adversarial Neural Network DGA With High Anti-Detection Ability , 2020, IEEE Transactions on Information Forensics and Security.

[42]  Jamal Bentahar,et al.  Optimal Load Distribution for the Detection of VM-Based DDoS Attacks in the Cloud , 2020, IEEE Transactions on Services Computing.

[43]  K. P. Soman,et al.  Evaluating deep learning approaches to characterize and classify the DGAs at scale , 2018, J. Intell. Fuzzy Syst..

[44]  Martine De Cock,et al.  An Evaluation of DGA Classifiers , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[45]  Vernon Schryver,et al.  DNS Response Policy Zones (RPZ) , 2016 .

[46]  Paul V. Mockapetris,et al.  Domain names - implementation and specification , 1987, RFC.

[47]  Vinod Yegneswaran,et al.  BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation , 2007, USENIX Security Symposium.

[48]  Martine De Cock,et al.  Inline DGA Detection with Deep Networks , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).