ExecScent: Mining for New C&C Domains in Live Networks with Adaptive Control Protocol Templates

In this paper, we present ExecScent, a novel system that aims to mine new, previously unknown C&C domain names from live enterprise network traffic. ExecScent automatically learns control protocol templates (CPTs) from examples of known C&C communications. These CPTs are then adapted to the "background traffic" of the network where the templates are to be deployed. The goal is to generate hybrid templates that can self-tune to each specific deployment scenario, thus yielding a better trade-off between true and false positives for a given network environment. To the best of our knowledge, ExecScent is the first system to use this type of adaptive C&C traffic models. We implemented a prototype version of ExecScent, and deployed it in three different large networks for a period of two weeks. During the deployment, we discovered many new, previously unknown C&C domains and hundreds of new infected machines, compared to using a large up-to-date commercial C&C domain blacklist. Furthermore, we deployed the new C&C domains mined by ExecScent to six large ISP networks, discovering more than 25,000 new infected machines.

[1]  David Brumley,et al.  BitShred: feature hashing malware for scalable triage and semantic analysis , 2011, CCS '11.

[2]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[3]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[4]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[5]  Roberto Perdisci,et al.  Scalable fine-grained behavioral clustering of HTTP-based malware , 2013, Comput. Networks.

[6]  Juan Caballero,et al.  FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors , 2013, RAID.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[9]  George Varghese,et al.  Automated Worm Fingerprinting , 2004, OSDI.

[10]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[11]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[12]  Vern Paxson,et al.  Measuring Pay-per-Install: The Commoditization of Malware Distribution , 2011, USENIX Security Symposium.

[13]  Guofei Gu,et al.  BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic , 2008, NDSS.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[16]  Leyla Bilge,et al.  Automatically Generating Models for Botnet Detection , 2009, ESORICS.

[17]  Christopher Krügel,et al.  JACKSTRAWS: Picking Command and Control Connections from Bot Traffic , 2011, USENIX Security Symposium.

[18]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[19]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[20]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[21]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[22]  Francisco Casacuberta,et al.  Topology of Strings: Median String is NP-Complete , 1999, Theor. Comput. Sci..

[23]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[24]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[25]  Leyla Bilge,et al.  EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis , 2011, NDSS.