Identifying Reused Functions in Binary Code

Discovering reused binary functions is crucial for many security applications, especially considering the fact that many modern malware typically contain a significant amount of functions borrowed from open-source software packages. This process will not only reduce the odds of common libraries leading to false correlations between unrelated code bases but also improve the efficiency of reverse engineering. We introduce a system for fingerprinting reused functions in binary code. More specifically, we introduce a new representation, namely, the semantic integrated graph (SIG), which integrates control flow graph, register flow graph, function-call graph, and other structural information, into a joint data structure. Such a comprehensive representation captures different semantic descriptors of common functionalities in a unified manner as graph traces of SIG graphs.