Towards Static Analysis of Virtualization-Obfuscated Binaries

Virtualization-obfuscation protects a program from manual or automated analysis by compiling it into byte code for a randomized virtual architecture and attaching a corresponding interpreter. Static analysis appears to be helpless on such programs, where only the code of the interpreter is directly visible. In this paper, we explain the particular challenges for statically analyzing the combination of interpreter and byte code. Static analysis for computing possible variable values is commonly precise only to the program location. In the interpreter loop, however, this combines unrelated data flow information from different locations of the byte code program. To avoid this loss of information, we show how to lift an existing static analysis to an additional dimension of location, to become sensitive to the value of the virtual program counter. Thus, the static analysis merges data flow from equal byte code locations only. We lift an existing analysis implemented in the Jakstab static analyzer and present preliminary results for processing a virtualization-obfuscated binary.