Instructions-Based Detection of Sophisticated Obfuscation and Packing

Every day thousands of malware are released online. The vast majority of these malware employ some kind of obfuscation ranging from simple XOR encryption, to more sophisticated anti-analysis, packing and encryption techniques. Dynamic analysis methods can unpack the file and reveal its hidden code. However, these methods are very time consuming when compared to static analysis. Moreover, considering the large amount of new malware being produced daily, it is not practical to solely depend on dynamic analysis methods. Therefore, finding an effective way to filter the samples and delegate only obfuscated and suspicious ones to more rigorous tests would significantly improve the overall scanning process. Current techniques of identifying obfuscation rely mainly on signatures of known packers, file entropy score, or anomalies in file header. However, these features are not only easily bypass-able, but also do not cover all types of obfuscation. In this paper, we introduce a novel approach to identify obfuscated files based on anomalies in their instructions-based characteristics. We detect the presence of interleaving instructions which are the result of the opaque predicate anti-disassembly trick, and present distinguishing statistical properties based on the opcodes and control flow graphs of obfuscated files. Our detection system combines these features with other file structural features and leads to a very good result of detecting obfuscated malware.

[1]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[2]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[3]  Wenke Lee,et al.  Classification of packed executables for accurate computer virus detection , 2008, Pattern Recognit. Lett..

[4]  Muhammad Zubair Shafiq,et al.  Embedded Malware Detection Using Markov n-Grams , 2008, DIMVA.

[5]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[6]  Christian S. Collberg,et al.  Software watermarking via opaque predicates: Implementation, analysis, and attacks , 2006, Electron. Commer. Res..

[7]  Yuval Elovici,et al.  Unknown malcode detection via text categorization and the imbalance problem , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[8]  Mian Zhou,et al.  A heuristic approach for detection of obfuscated malware , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[9]  Muhammad Zubair Shafiq,et al.  PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime , 2009, RAID.

[10]  Eldad Eilam,et al.  Reversing: Secrets of Reverse Engineering , 2005 .

[11]  Igor Santos,et al.  Countering entropy measure attacks on packed software detection , 2012, 2012 IEEE Consumer Communications and Networking Conference (CCNC).

[12]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[13]  S. Momina Tabish,et al.  PE-Probe: Leveraging Packer Detection and Structural Information to Detect Malicious Portable Executables , 2009 .

[14]  Igor Santos,et al.  Collective classification for packed executable identification , 2011, CEAS '11.

[15]  Somesh Jha,et al.  Behavior-based malware detection , 2007 .