From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware

Many botnet detection systems employ a blacklist of known command and control (C&C) domains to detect bots and block their traffic. Similar to signature-based virus detection, such a botnet detection approach is static because the blacklist is updated only after running an external (and often manual) process of domain discovery. As a response, botmasters have begun employing domain generation algorithms (DGAs) to dynamically produce a large number of random domain names and select a small subset for actual C&C use. That is, a C&C domain is randomly generated and used for a very short period of time, thus rendering detection approaches that rely on static domain lists ineffective. Naturally, if we know how a domain generation algorithm works, we can generate the domains ahead of time and still identify and block bot-net C&C traffic. The existing solutions are largely based on reverse engineering of the bot malware executables, which is not always feasible. In this paper we present a new technique to detect randomly generated domains without reversing. Our insight is that most of the DGA-generated (random) domains that a bot queries would result in Non-Existent Domain (NXDomain) responses, and that bots from the same bot-net (with the same DGA algorithm) would generate similar NXDomain traffic. Our approach uses a combination of clustering and classification algorithms. The clustering algorithm clusters domains based on the similarity in the make-ups of domain names as well as the groups of machines that queried these domains. The classification algorithm is used to assign the generated clusters to models of known DGAs. If a cluster cannot be assigned to a known model, then a new model is produced, indicating a new DGA variant or family. We implemented a prototype system and evaluated it on real-world DNS traffic obtained from large ISPs in North America. We report the discovery of twelve DGAs. Half of them are variants of known (botnet) DGAs, and the other half are brand new DGAs that have never been reported before.

[1]  Michael Ligh,et al.  Malware Analyst's Cookbook and DVD: Tools and Techniques for Fighting Malicious Code , 2010 .

[2]  Paul V. Mockapetris,et al.  Domain names: Concepts and facilities , 1983, RFC.

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Sven Dietrich,et al.  Analysis of the Storm and Nugache Trojans: P2P Is Here , 2007, login Usenix Mag..

[5]  Guofei Gu,et al.  BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic , 2008, NDSS.

[6]  Fabian Monrose,et al.  DNS Prefetching and Its Privacy Implications: When Good Things Go Bad , 2010, LEET.

[7]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[8]  Vinod Yegneswaran,et al.  BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation , 2007, USENIX Security Symposium.

[9]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[10]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[11]  Xiapu Luo,et al.  Detecting stealthy P2P botnets using statistical traffic fingerprints , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[12]  Paul V. Mockapetris,et al.  Domain names - implementation and specification , 1987, RFC.

[13]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[14]  Christopher Krügel,et al.  Your botnet is my botnet: analysis of a botnet takeover , 2009, CCS.

[15]  Kjersti Aas,et al.  Text Categorisation: A Survey , 1999 .

[16]  Mohamed Naguib,et al.  What we know: precise measurement leads to patient comfort and safety. , 2011, Anesthesiology.

[17]  Hassen Saïdi,et al.  A Foray into Conficker's Logic and Rendezvous Points , 2009, LEET.

[18]  Sandeep Yadav,et al.  Winning with DNS Failures: Strategies for Faster Botnet Detection , 2011, SecureComm.

[19]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[22]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[23]  R. Villamarin-Salomon,et al.  Identifying Botnets Using Anomaly Detection Techniques Applied to DNS Traffic , 2008, 2008 5th IEEE Consumer Communications and Networking Conference.

[24]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[25]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[26]  Sandeep Yadav,et al.  Detecting algorithmically generated malicious domain names , 2010, IMC '10.

[27]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[28]  Michael K. Reiter,et al.  Traffic Aggregation for Malware Detection , 2008, DIMVA.

[29]  Michael K. Reiter,et al.  Are Your Hosts Trading or Plotting? Telling P2P File-Sharing and Bots Apart , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.