Incorporating known malware signatures to classify new malware variants in network traffic

Summary Content-based malware classification technique using n-gram features required high computational overhead because of the size of feature space. This paper proposes the augmentation of domain knowledge in the form of known Snort malware signatures to machine learning techniques to reduce resources (in terms of the time to generate machine learning model and the memory usage to store generative model). Although current malware can be encrypted or mutated, these malware still exhibit prevalent contents or payloads as their predecessors. Using a dataset of traffic captured from a campus network, our approach is able to reduce initial generated million n-gram features to only around 90000 features, which significantly reduces processing time to generate naive Bayes model by 95%. The generated model that has been trained by the most descriptive features (4-gram Snort signatures with high information gain) produces lower false negative, about 2% compared with other models. Moreover, the proposed method is capable of detecting 10 new malware variants with 0% false negative. The findings from this paper can be the basis for improving malware classification based on content classification to detect known and new malware. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  Guanhua Yan,et al.  Exploring Discriminatory Features for Automated Malware Classification , 2013, DIMVA.

[2]  Yuval Elovici,et al.  Unknown malcode detection via text categorization and the imbalance problem , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[3]  Christos Faloutsos,et al.  Polonium: Tera-Scale Graph Mining for Malware Detection , 2013 .

[4]  Khaled Salah,et al.  Framework for a NetFPGA-based Snort NIDS , 2014, 2014 9th International Symposium on Communication Systems, Networks & Digital Sign (CSNDSP).

[5]  Tankut Acarman,et al.  Proposal of n-gram Based Algorithm for Malware Classification , 2011, SECURWARE 2011.

[6]  John W. Lockwood,et al.  A hardware-accelerated system for real-time worm detection , 2005, IEEE Micro.

[7]  Vlado Keselj,et al.  Detection of New Malicious Code Using N-grams Signatures , 2004, PST.

[8]  Sulaiman Mohd Nor,et al.  Detecting Worms Using Data Mining Techniques: Learning in the Presence of Class Noise , 2010, 2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems.

[9]  Lawrence M. Rudner,et al.  Automated Essay Scoring Using Bayes' Theorem , 2002 .

[10]  William Groves Using Domain Knowledge to Systematically Guide Feature Selection , 2013, IJCAI.

[11]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[12]  Yanrong Yang,et al.  Malware Detection Through Mining Symbol Table of Linux Executables , 2013 .

[13]  Khaled Salah,et al.  Accelerating snort NIDS using NetFPGA-based Bloom filter , 2014, 2014 International Wireless Communications and Mobile Computing Conference (IWCMC).

[14]  George Varghese,et al.  Detecting evasion attacks at high speeds without reassembly , 2006, SIGCOMM 2006.

[15]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[16]  Craig Fosnock Computer Worms: Past, Present, and Future , 2005 .

[17]  Elmarie Biermann,et al.  Implementation of a Socially Engineered Worm to Increase Information Security Awareness , 2008, 2008 Third International Conference on Broadband Communications, Information Technology & Biomedical Applications.

[18]  Xiaojin Zhu,et al.  Incorporating domain knowledge in latent topic models , 2010 .

[19]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[20]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[21]  George Varghese,et al.  Automated Worm Fingerprinting , 2004, OSDI.

[22]  Lior Rokach,et al.  Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features , 2012, J. Mach. Learn. Res..

[23]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[24]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[25]  Yuval Elovici,et al.  Detecting unknown malicious code by applying classification techniques on OpCode patterns , 2012, Security Informatics.

[26]  Paul A. Watters,et al.  Cybercrime: The Case of Obfuscated Malware , 2011, ICGS3/e-Democracy.

[27]  Muhammad Zubair Shafiq,et al.  Improving accuracy of immune-inspired malware detectors by using intelligent features , 2008, GECCO '08.

[28]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[29]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[30]  Mattia Monga,et al.  Code Normalization for Self-Mutating Malware , 2007, IEEE Security & Privacy.

[31]  Andrew Walenstein,et al.  VILO: a rapid learning nearest-neighbor classifier for malware triage , 2013, Journal of Computer Virology and Hacking Techniques.

[32]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[33]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.