Dynamic K-Gram Based Software Birthmark

Software theft is a threat for companies that consider code as a core asset. A birthmark can help to prove software theft by identifying intrinsic properties of a program. Two programs with the same birthmark are likely to share a common origin. In this paper, we propose a novel dynamic birthmark. Using a dynamic program slicing tool with the given input, a union of k-gram instruction-sequence sets denoted as birthmark is used to identify a program uniquely. To evaluate the strength of the birthmarking technique, we compare static k-gram based software birthmark with dynamic approach from similarity with academic obfuscation tools. The result shows that the new birthmark provides both high credibility and resilience. In particular, it proves that the dynamic birthmark is more resilient to semantics-preserving transformations than the static k-gram birthmark.