Generating Effective Software Obfuscation Sequences With Reinforcement Learning

Obfuscation is a prevalent security technique which transforms syntactic representation of a program to a complicated form, but still keeps program semantics unchanged. So far, developers heavily rely on obfuscation to harden their products and reduce the risk of adversarial reverse engineering. However, despite its spectacular progress, one crucial hurdle is that each of existing obfuscation method is designed specifically for obfuscating one program feature (e.g., identifier name, control flow), so an effective obfuscation scheme usually composes a considerable amount of different obfuscation methods. Therefore, one primary challenge lies in identifying effective combinations of obfuscation methods. In this research, we propose a principled technique for generating an optimal program obfuscation scheme by adopting a reinforcement learning approach. Given a program and a set of obfuscation transformations, a reinforcement learning model is progressively trained to select a sequence of obfuscation transformations, such that applying each transformation in order toward the program yields the optimal obfuscation result, making programs dissimilar while retaining reasonable instrumentation overhead. Our implementation can directly work on raw binary executables without source code, and our evaluation demonstrates that the trained models can effectively obfuscate executable files with low cost.