ECFGM: enriched control flow graph miner for unknown vicious infected code detection

Vicious codes, especially viruses, as a kind of impressive malware have caused many disasters and continue to exploit more vulnerabilities. These codes are injected inside benign programs in order to abuse their hosts and ease their propagation. The offsets of injected virus codes are unknown and their targets usually are latent until they are executed and activated, what in turn makes viruses very hard to detect. In this paper enriched control flow graph miner, ECFGM in short, is presented to detect infected files corrupted by unknown viruses. ECFGM uses enriched control flow graph model to represent the benign and vicious codes. This model has more information than traditional control flow graph (CFG) by utilizing statistical information of dependent assembly instructions and API calls. To the best of our knowledge, the presented approach in this paper, for the first time, can recognize the offset of infected code of unknown viruses in the victim files. The main contributions of this paper are two folds: first, the presented model is able to detect unknown vicious code using ECFG model with reasonable complexity and desirable accuracy. Second, our approach is resistant against metamorphic viruses which utilize dead code insertion, variable renaming and instruction reordering methods.

[1]  A. A. Zaidan,et al.  New Technique of Hidden Data in PE-File with in Unused Area One , 2009 .

[2]  Yingxu Lai,et al.  A Feature Selection for Malicious Detection , 2008, 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing.

[3]  Aditya P. Mathur,et al.  A Survey of Malware Detection Techniques , 2007 .

[4]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[5]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[6]  Wenke Lee,et al.  Classification of packed executables for accurate computer virus detection , 2008, Pattern Recognit. Lett..

[7]  Jau-Hwang Wang,et al.  Virus detection using data mining techinques , 2003, IEEE 37th Annual 2003 International Carnahan Conference onSecurity Technology, 2003. Proceedings..

[8]  Zhi-hong Zuo,et al.  On the time complexity of computer viruses , 2005, IEEE Transactions on Information Theory.

[9]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[10]  Gerald Tesauro,et al.  Neural networks for computer virus recognition , 1996 .

[11]  Kangbin Yim,et al.  Malware Obfuscation Techniques: A Brief Survey , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[12]  Lior Rokach,et al.  Improving malware detection by applying multi-inducer ensemble , 2009, Comput. Stat. Data Anal..

[13]  Christopher Krügel,et al.  Dynamic Analysis of Malicious Code , 2006, Journal in Computer Virology.

[14]  A.H. Sung,et al.  Malware examiner using disassembled code (MEDiC) , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[15]  Vijay Laxmi,et al.  Static CFG analyzer for metamorphic Malware code , 2009, SIN '09.

[16]  Daniel Bilar On callgraphs and generative mechanisms , 2007, J. Comput. Virol..

[17]  Christopher Krügel,et al.  Limits of Static Analysis for Malware Detection , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[18]  Aman Jantan,et al.  A Framework for Malware Detection Using Combination Technique and Signature Generation , 2010, 2010 Second International Conference on Computer Research and Development.

[19]  Marc Dacier,et al.  Intrusion Detection Using Variable-Length Audit Trail Patterns , 2000, Recent Advances in Intrusion Detection.

[20]  Mattia Monga,et al.  Detecting Self-mutating Malware Using Control-Flow Graph Matching , 2006, DIMVA.

[21]  Joris Kinable,et al.  Improved call graph comparison using simulated annealing , 2011, SAC.

[22]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[23]  P. Harmya,et al.  Malware detection using assembly code and control flow graph optimization , 2010, A2CWiC '10.

[24]  Bo,et al.  [IEEE Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007) - Qingdao, China (2007.07.30-2007.08.1)] Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Para , 2007 .

[25]  Mingtian Zhou,et al.  Some Further Theoretical Results about Computer Viruses , 2004, Comput. J..

[26]  David Harley Making sense of anti-malware comparative testing , 2009, Inf. Secur. Tech. Rep..

[27]  Jianping Yin,et al.  New Malicious Code Detection Based on N-gram Analysis and Rough Set Theory , 2006, 2006 International Conference on Computational Intelligence and Security.

[28]  Rong Jin,et al.  Semi-Supervised Learning by Mixed Label Propagation , 2007, AAAI.

[29]  Srinivas Mukkamala,et al.  Malware detection using assembly and API call sequences , 2011, Journal in Computer Virology.