Automated Classification of C&C Connections Through Malware URL Clustering

We present WebVisor, an automated tool to derive patterns from malware Command and Control (C&C) server connections. From collective network communications stored on a large-scale malware dataset, WebVisor establishes the underlying patterns among samples of the same malware families (e.g., families in terms of development tools). WebVisor focuses on C&C channels based on the Hypertext Transfer Protocol (HTTP). First, it builds clusters based on the statistical features of the HTTP-based Uniform Resource Locators (URLs) stored in the malware dataset. Then, it conducts a fine-grained, noise-agnostic clustering process, based on the structure and semantic features of the URLs. We present experimental results using a software prototype of WebVisor and real-world malware datasets.

[1]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[2]  John Riedl,et al.  Generalized suffix trees for biological sequence data: applications and implementation , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[3]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[4]  D. Pham,et al.  An Incremental K-means algorithm , 2004 .

[5]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[6]  Wenke Lee,et al.  Misleading worm signature generators using deliberate noise injection , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[7]  U. Bayer,et al.  TTAnalyze: A Tool for Analyzing Malware , 2006 .

[8]  Ming-Yang Kao,et al.  Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[9]  Farnam Jahanian,et al.  CloudAV: N-Version Antivirus in the Network Cloud , 2008, USENIX Security Symposium.

[10]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[11]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[12]  Prateek Mittal,et al.  BotGrep: Finding P2P Bots with Structured Graph Analysis , 2010, USENIX Security Symposium.

[13]  Shouhuai Xu,et al.  Social Network-Based Botnet Command-and-Control: Emerging Threats and Countermeasures , 2010, ACNS.

[14]  Christopher Krügel,et al.  JACKSTRAWS: Picking Command and Control Connections from Bot Traffic , 2011, USENIX Security Symposium.

[15]  Christian Platzer,et al.  Detecting malware's failover C&C strategies with squeeze , 2011, ACSAC '11.

[16]  Michalis Faloutsos,et al.  PhishDef: URL names say it all , 2010, 2011 Proceedings IEEE INFOCOM.

[17]  R. Kashyap,et al.  The New Era of Botnets , 2012 .

[18]  Roberto Perdisci,et al.  From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware , 2012, USENIX Security Symposium.

[19]  Juan Caballero,et al.  Driving in the Cloud: An Analysis of Drive-by Download Operations and Abuse Reporting , 2013, DIMVA.

[20]  Nizar Kheir,et al.  Behavioral classification and detection of malware through HTTP user agent anomalies , 2013, J. Inf. Secur. Appl..

[21]  Nizar Kheir,et al.  BotSuer: Suing Stealthy P2P Bots in Network Traffic through Netflow Analysis , 2013, CANS.

[22]  Xiao Han,et al.  PeerViewer: Behavioral Tracking and Classification of P2P Malware , 2013, CSS.

[23]  Juan Caballero,et al.  FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors , 2013, RAID.

[24]  Roberto Perdisci,et al.  Scalable fine-grained behavioral clustering of HTTP-based malware , 2013, Comput. Networks.