SAS: semantics aware signature generation for polymorphic worm detection

String extraction and matching techniques have been widely used in generating signatures for worm detection, but how to generate effective worm signatures in an adversarial environment still remains a challenging problem. For example, attackers can freely manipulate byte distributions within the attack payloads and thus inject well-crafted noisy packets to contaminate the suspicious flow pool. To address these attacks, we propose SAS, a novel Semantics Aware Statistical algorithm for automatic signature generation. When SAS processes packets in a suspicious flow pool, it uses data flow analysis techniques to remove non-critical bytes. We then apply a hidden Markov model (HMM) to the refined data to generate state-transition-graph-based signatures. To our best knowledge, this is the first work combining semantic analysis with statistical analysis to automatically generate worm signatures. Our experiments show that the proposed technique can accurately detect worms with concise signatures. Moreover, our results indicate that SAS is more robust to the byte distribution changes and noise injection attacks compared to Polygraph and Hamsa.

[1]  Tzi-cker Chiueh,et al.  DIRA: Automatic Detection, Identification and Repair of Control-Hijacking Attacks , 2005, NDSS.

[2]  Giovanni Vigna,et al.  Catch Me, If You Can: Evading Network Signatures with Web-based Polymorphic Worms , 2007, WOOT.

[3]  Giovanni Vigna,et al.  Feature Omission Vulnerabilities: Thwarting Signature Generation for Polymorphic Worms , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[4]  Sencun Zhu,et al.  SigFree: A Signature-Free Buffer Overflow Attack Blocker , 2010, IEEE Transactions on Dependable and Secure Computing.

[5]  Zhenkai Liang,et al.  Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation , 2007, USENIX Security Symposium.

[6]  Joshua Mason,et al.  English shellcode , 2009, CCS.

[7]  Christopher Krügel,et al.  Limits of Static Analysis for Malware Detection , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[8]  Jon Crowcroft,et al.  Honeycomb , 2004, Comput. Commun. Rev..

[9]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[10]  Dawn Xiaodong Song,et al.  Limits of Learning-based Signature Generation with Adversaries , 2008, NDSS.

[11]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[12]  Angelos D. Keromytis,et al.  e-NeXSh: achieving an effectively non-executable stack and heap via system-call policing , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[13]  Dong Xuan,et al.  Malicious Shellcode Detection with Virtual Memory Snapshots , 2010, 2010 Proceedings IEEE INFOCOM.

[14]  Sumeet Singh,et al.  The EarlyBird System for Real-time Detection of Unknown Worms , 2005 .

[15]  James Newsom,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Network and Distributed System Security Symposium Conference Proceedings : 2005 , 2005 .

[16]  Evangelos P. Markatos,et al.  Emulation-Based Detection of Non-self-contained Polymorphic Shellcode , 2007, RAID.

[17]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[18]  Aloysius K. Mok,et al.  Advanced Allergy Attacks: Does a Corpus Really Help? , 2007, RAID.

[19]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[20]  Sencun Zhu,et al.  STILL: Exploit Code Detection via Static Taint and Initialization Analyses , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[21]  Salvatore J. Stolfo,et al.  On the infeasibility of modeling polymorphic shellcode , 2009, Machine Learning.

[22]  Wenke Lee,et al.  Polymorphic Blending Attacks , 2006, USENIX Security Symposium.

[23]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[24]  Zhenkai Liang,et al.  Automatic generation of buffer overflow attack signatures: an approach based on program behavior models , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[25]  Ville Leppänen,et al.  MTPA - A Processor Architecture for MP-SOCs Employing the Moving Threads Paradigm , 2009, PDPTA.

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Christian S. Collberg,et al.  A Taxonomy of Obfuscating Transformations , 1997 .

[28]  Piotr Bania Evading network-level emulation , 2009, ArXiv.

[29]  Ming-Yang Kao,et al.  Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[30]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[31]  Kevin Borders,et al.  Spector: Automatically Analyzing Shell Code , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[32]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[33]  Zhenkai Liang,et al.  Fast and automated generation of attack signatures: a basis for building self-protecting servers , 2005, CCS '05.

[34]  Wenke Lee,et al.  Misleading worm signature generators using deliberate noise injection , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).