Detection of algorithmically generated domain names used by botnets: a dual arms race

Malware typically uses Domain Generation Algorithms (DGAs) as a mechanism to contact their Command and Control server. In recent years, different approaches to automatically detect generated domain names have been proposed, based on machine learning. The first problem that we address is the difficulty to systematically compare these DGA detection algorithms due to the lack of an independent benchmark. The second problem that we investigate is the difficulty for an adversary to circumvent these classifiers when the machine learning models backing these DGA-detectors are known. In this paper we compare two different approaches on the same set of DGAs: classical machine learning using manually engineered features and a 'deep learning' recurrent neural network. We show that the deep learning approach performs consistently better on all of the tested DGAs, with an average classification accuracy of 98.7% versus 93.8% for the manually engineered features. We also show that one of the dangers of manual feature engineering is that DGAs can adapt their strategy, based on knowledge of the features used to detect them. To demonstrate this, we use the knowledge of the used feature set to design a new DGA which makes the random forest classifier powerless with a classification accuracy of 59.9%. The deep learning classifier is also (albeit less) affected, reducing its accuracy to 85.5%.

[1]  Sandeep Yadav,et al.  Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis , 2012, IEEE/ACM Transactions on Networking.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Wouter Joosen,et al.  Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation , 2018, NDSS.

[4]  Wouter Joosen,et al.  Automated Feature Extraction for Website Fingerprinting through Deep Learning. , 2017 .

[5]  Hyrum S. Anderson,et al.  Predicting Domain Generation Algorithms with Long Short-Term Memory Networks , 2016, ArXiv.

[6]  Herbert Bos,et al.  SoK: P2PWNED - Modeling and Evaluating the Resilience of Peer-to-Peer Botnets , 2013, 2013 IEEE Symposium on Security and Privacy.

[7]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[8]  Hyrum S. Anderson,et al.  DeepDGA: Adversarially-Tuned Domain Generation and Detection , 2016, AISec@CCS.

[9]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[10]  Martine De Cock,et al.  Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic , 2018, RAID.

[11]  Farnam Jahanian,et al.  The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets , 2005, SRUTI.

[12]  Ulrike Meyer,et al.  FANCI : Feature-based Automated NXDomain Classification and Intelligence , 2018, USENIX Security Symposium.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Ian H. Witten,et al.  Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques , 2016 .

[15]  Johannes Bader,et al.  A Comprehensive Measurement Study of Domain Generating Malware , 2016, USENIX Security Symposium.

[16]  Stefano Zanero,et al.  Phoenix: DGA-Based Botnet Tracking and Intelligence , 2014, DIMVA.

[17]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[18]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[19]  Miranda Mowbray,et al.  Finding Domain-Generation Algorithms by Looking at Length Distribution , 2014, 2014 IEEE International Symposium on Software Reliability Engineering Workshops.