Efficient and Accurate Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks

In this article, we propose Segugio, a novel defense system that allows for efficiently tracking the occurrence of new malware-control domain names in very large ISP networks. Segugio passively monitors the DNS traffic to build a machine-domain bipartite graph representing who is querying what. After labeling nodes in this query behavior graph that are known to be either benign or malware-related, we propose a novel approach to accurately detect previously unknown malware-control domains. We implemented a proof-of-concept version of Segugio and deployed it in large ISP networks that serve millions of users. Our experimental results show that Segugio can track the occurrence of new malware-control domains with up to 94% true positives (TPs) at less than 0.1% false positives (FPs). In addition, we provide the following results: (1) we show that Segugio can also detect control domains related to new, previously unseen malware families, with 85% TPs at 0.1% FPs; (2) Segugio’s detection models learned on traffic from a given ISP network can be deployed into a different ISP network and still achieve very high detection accuracy; (3) new malware-control domains can be detected days or even weeks before they appear in a large commercial domain-name blacklist; (4) Segugio can be used to detect previously unknown malware-infected machines in ISP networks; and (5) we show that Segugio clearly outperforms domain-reputation systems based on Belief Propagation.

[1]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[4]  Vinod Yegneswaran,et al.  BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation , 2007, USENIX Security Symposium.

[5]  Nasir D. Memon,et al.  Friends of an enemy: identifying local members of peer-to-peer botnets using mutual contacts , 2010, ACSAC '10.

[6]  Vern Paxson,et al.  Measuring Pay-per-Install: The Commoditization of Malware Distribution , 2011, USENIX Security Symposium.

[7]  Babak Rahbarinia,et al.  Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[8]  Christos Faloutsos,et al.  Polonium: Tera-Scale Graph Mining and Inference for Malware Detection , 2011 .

[9]  Christian Rossow,et al.  RUHR-UNIVERSITÄT BOCHUM , 2014 .

[10]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[11]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[12]  Kuai Xu,et al.  Network-aware behavior clustering of Internet end hosts , 2011, 2011 Proceedings IEEE INFOCOM.

[13]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[14]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[15]  Xiapu Luo,et al.  Detecting stealthy P2P botnets using statistical traffic fingerprints , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[18]  Vern Paxson,et al.  On the Potential of Proactive Domain Blacklisting , 2010, LEET.

[19]  Sandeep Yadav,et al.  Detecting Malicious Domains via Graph Inference , 2014, AISec '14.

[20]  Keisuke Ishibashi,et al.  Extending Black Domain Name List by Using Co-occurrence Relation between DNS Queries , 2010, LEET.

[21]  Roberto Perdisci,et al.  ExecScent: Mining for New C&C Domains in Live Networks with Adaptive Control Protocol Templates , 2013, USENIX Security Symposium.

[22]  Juan Caballero,et al.  FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors , 2013, RAID.

[23]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[24]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[25]  Herbert Bos,et al.  Large-Scale Analysis of Malware Downloaders , 2012, DIMVA.

[26]  Guofei Gu,et al.  BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic , 2008, NDSS.

[27]  Christopher Krügel,et al.  JACKSTRAWS: Picking Command and Control Connections from Bot Traffic , 2011, USENIX Security Symposium.

[28]  Le Song,et al.  Kernel Belief Propagation , 2011, AISTATS.

[29]  Leyla Bilge,et al.  Automatically Generating Models for Botnet Detection , 2009, ESORICS.

[30]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[31]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[32]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[33]  Michael K. Reiter,et al.  Are Your Hosts Trading or Plotting? Telling P2P File-Sharing and Bots Apart , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.