DGA detection using machine learning methods

A botnet is a network of private computers infected with malicious software and controlled as a group without the knowledge of the owners. Botnets are used by cyber criminals for various malicious activities such as stealing sensitive data, sending spam, launching Distributed Denial of Service (DDoS) attacks, etc. A Command and Control (C&C) server sends commands to the compromised hosts for executing those malicious activities. In order to avoid detection, recent botnets such as Conficker, Zeus and Cryptolocker apply a technique called Domain Fluxing or Domain Name Generation Algorithms (DGA), where the infected bot is periodically generating and trying to resolve a large number of pseudorandom domain names until one of them is resolved by the DNS server. In this thesis, we survey different machine learning methods for detecting such DGAs by analyzing only the alphanumeric characteristics of the domain names in the network. We propose unsupervised models and evaluate their performance while comparing them with existing supervised models used in previous researches in this field. In addition, we propose a novel approach for unsupervised one-class SVM model for anomaly detection, which called Random One Class SVM (ROC-SVM). Our proposed unsupervised methods achieve better results than the compared supervised techniques, while detecting zero-day DGAs. If the run-time is of main concern, our novel approach for unsupervised one-class SVM is the best model among the others.

[1]  Maureen Caudill,et al.  Neural nets primer, part VI , 1989 .

[2]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[3]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[4]  P. Jaccard Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines , 1901 .

[5]  Ashraf Abu-Alia,et al.  DETECTING DOMAIN FLUX BOTNET USING MACHINE LEARNING TECHNIQUES , 2015 .

[6]  Sandeep Yadav,et al.  Detecting algorithmically generated malicious domain names , 2010, IMC '10.

[7]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[8]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[9]  Linh Giang Nguyen,et al.  DGA Botnet detection using Collaborative Filtering and Density-based Clustering , 2015, SoICT.

[10]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[11]  Mourad Debbabi,et al.  On the Reverse Engineering of the Citadel Botnet , 2014, FPS.

[12]  Christopher Krügel,et al.  Your botnet is my botnet: analysis of a botnet takeover , 2009, CCS.

[13]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[14]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[15]  Hassen Saïdi,et al.  A Foray into Conficker's Logic and Rendezvous Points , 2009, LEET.

[16]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[17]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[18]  Herbert Bos,et al.  Highly resilient peer-to-peer botnets are here: An analysis of Gameover Zeus , 2013, 2013 8th International Conference on Malicious and Unwanted Software: "The Americas" (MALWARE).

[19]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[20]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[21]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .