A Method for Shellcode Extractionfrom Malicious Document Files Using Entropy and Emulation

We propose a method for the dynamic analysis of malicious documents that can exploit various types of vulnerability in applications. Static analysis of a document can be used to identify the type of vulnerability involved. However, it can be difficult to identify unknown vulnerabilities, and the application may not be available even if we could identify the vulnerability. In fact, malicious code that is executed after the exploitation may not have a relationship with the type of vulnerability in many cases. In this paper, we propose a method that extracts and executes "shellcode" to analyze malicious documents without requiring identification of the vulnerability or the application. Our system extracts shellcode by executing byte sequences to observe the features of a document file in a priority order decided on the basis of entropy.Our system was used to analyze 88 malware samples and was able to extract shellcode from 74 samples. of these, 51 extracted shellcodes behaved as malicious software according to dynamic analysis. Index Terms—Malware, shellcode, entropy, dynamic analysis, vulnerability.