Stealthy Domain Generation Algorithms

Botnets are groups of compromised computers that botmasters (botherders) use to launch attacks over the Internet. To avoid detection, botnets use DNS fast flux to change the mapping between IP addresses and domain names periodically. Domain generation algorithms (DGAs) are employed to generate a large number of domain names. Detection techniques have been proposed to identify malicious domain names generated by DGAs. Three metrics, Kullback–Leibler (KL) distance, Edit distance (ED), and Jaccard index (JI), are used to detect botnet domains with up to 100% detection rate and 2.5% false-positive rate. In this paper, we propose two DGAs that use hidden Markov models (HMMs) and probabilistic context-free grammars (PCFGs), respectively. Experiment results show that DGA detection metrics (KL, JI, and ED) and detection systems (BotDigger and Pleiades) have difficulty detecting domain names generated using the proposed approaches. Game theory is used to optimize strategies for both botmasters and security personnel. Results show that, to optimize DGA detection, security personnel should use the ED detection technique with probability 0.78 and JI detection with probability 0.22, and botmasters should choose the HMM-based DGA with probability 0.67 and PCFG-based DGA with probability 0.33.

[1]  Jason M. Schwier,et al.  Inferring Statistically Significant Hidden Markov Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[2]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[3]  Herbert Bos,et al.  Highly resilient peer-to-peer botnets are here: An analysis of Gameover Zeus , 2013, 2013 8th International Conference on Malicious and Unwanted Software: "The Americas" (MALWARE).

[4]  John Aycock,et al.  Kwyjibo: automatic domain name generation , 2008, Softw. Pract. Exp..

[5]  Heng Yin,et al.  Thwarting E-mail Spam Laundering , 2008, TSEC.

[6]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[7]  Sandeep Yadav,et al.  Detecting algorithmically generated malicious domain names , 2010, IMC '10.

[8]  A. Copeland Book Review: Theory of games and economic behavior , 1945 .

[9]  Byung-Ryul Ahn,et al.  Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.

[10]  James P. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[11]  Franklin Mark Liang Word hy-phen-a-tion by com-put-er , 1983 .

[12]  Jing Tao,et al.  DNSRadar: Outsourcing Malicious Domain Detection Based on Distributed Cache-Footprints , 2014, IEEE Transactions on Information Forensics and Security.

[13]  Wilbert Jan Heeringa Measuring dialect pronunciation differences using Levenshtein distance , 2004 .

[14]  Guofei Gu,et al.  A Large-Scale Empirical Study of Conficker , 2012, IEEE Transactions on Information Forensics and Security.

[15]  Leyla Bilge,et al.  Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains , 2014, TSEC.

[16]  T. E. Harris,et al.  The Theory of Branching Processes. , 1963 .

[17]  Jiyong Jang,et al.  Scalable analytics to detect DNS misuse for establishing stealthy communication channels , 2016, IBM J. Res. Dev..

[18]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[19]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[20]  Yu Fu,et al.  Stealthy malware traffic - Not as innocent as it looks , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[21]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[22]  M Gasser,et al.  A Random Word Generator for Pronounceable Passwords , 1975 .

[23]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[24]  Stefano Zanero,et al.  Tracking and Characterizing Botnets Using Automatically Generated Domains , 2013, ArXiv.

[25]  Sandeep Yadav,et al.  Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis , 2012, IEEE/ACM Transactions on Networking.

[26]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[27]  Radu State,et al.  Semantic based DNS forensics , 2012, 2012 IEEE International Workshop on Information Forensics and Security (WIFS).

[28]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[29]  Peng Liu,et al.  Incentive-based modeling and inference of attacker intent, objectives, and strategies , 2003, CCS '03.

[30]  Brigitte Bigi,et al.  Using Kullback-Leibler Distance for Text Categorization , 2003, ECIR.

[31]  Han Zhang,et al.  BotDigger: Detecting DGA Bots in a Single Network , 2016, TMA.

[32]  George Kesidis,et al.  Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling , 2014, Journal of advanced research.

[33]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[34]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[35]  Elmar Gerhards-Padilla,et al.  Automatic Extraction of Domain Name Generation Algorithms from Current Malware , 2012 .

[36]  Jiyong Jang,et al.  BotMeter: Charting DGA-Botnet Landscapes in Large Networks , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[37]  A. Copeland Review: John von Neumann and Oskar Morgenstern, Theory of games and economic behavior , 1945 .

[38]  Qian Liu,et al.  Detecting Machine Generated Domain Names Based on Morpheme Features , 2013, CloudCom 2013.

[39]  Satish T. S. Bukkapatnam,et al.  Zero knowledge hidden Markov model inference , 2009, Pattern Recognit. Lett..

[40]  Miranda Mowbray,et al.  Finding Domain-Generation Algorithms by Looking at Length Distribution , 2014, 2014 IEEE International Symposium on Software Reliability Engineering Workshops.

[41]  Chen Lu,et al.  A Normalized Statistical Metric Space for Hidden Markov Models , 2013, IEEE Transactions on Cybernetics.

[42]  Hui-Tang Lin,et al.  DBod: Clustering and detecting DGA-based botnets using DNS traffic analysis , 2017, Comput. Secur..

[43]  Chen Lu,et al.  Network Traffic Analysis Using Stochastic Grammars , 2012 .

[44]  Richard R. Brooks,et al.  Pattern recognition for command and control data systems , 2009 .

[45]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[46]  Christopher Krügel,et al.  Your botnet is my botnet: analysis of a botnet takeover , 2009, CCS.

[47]  Wei Jiang,et al.  Botnet: Survey and Case Study , 2009, 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC).

[48]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[49]  Yu Fu,et al.  Analysis of Botnet Counter-Counter-Measures , 2015, CISR.