Scalable Framework for Accurate Binary Code Comparison

Comparison of two binary files has many practical applications: the ability to detect programmatic changes between two versions, the ability to find old versions of statically linked libraries to prevent the use of well-known bugs, malware analysis, etc. In this article, a framework for comparison of binary files is presented. Framework uses IdaPro [1] disassembler and Binnavi [2] platform to recover structure of the target program and represent it as a call graph (CG). A program dependence graph (PDG) corresponds to each vertex of the CG. The proposed comparison algorithm consists of two main stages. At the first stage, several heuristics are applied to find the exact matches. Two functions are matched if at least one of the calculated heuristics is the same and unique in both binaries. At the second stage, backward and forward slicing is applied on matched vertices of CG to find further matches. According to empiric results heuristic method is effective and has high matching quality for unchanged or slightly modified functions. As a contradiction, to match heavily modified functions, binary code clone detection is used and it is based on finding maximum common subgraph for pair of PDGs. To achieve high performance on extensive binaries, the whole matching process is parallelized. The framework is tested on the number of real world libraries, such as python, openssh, openssl, libxml2, rsync, php, etc. Results show that in most cases more than 95% functions are truly matched. The tool is scalable due to parallelization of functions matching process and generation of PDGs and CGs.

[1]  Zheng Wang,et al.  BMAT - A Binary Matching Tool for Stale Profile Propagation , 2000, J. Instr. Level Parallelism.

[2]  Thomas Dullien,et al.  Graph-based comparison of Executable Objects , 2005 .

[3]  Koen De Bosschere,et al.  Protecting Your Software Updates , 2013, IEEE Security & Privacy.

[4]  Yang Xiang,et al.  Classification of malware using structured control flow , 2010 .

[5]  Andy King,et al.  BinSlayer: accurate comparison of binary executables , 2013, PPREW '13.

[6]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[7]  Halvar Flake,et al.  Structural Comparison of Executable Objects , 2004, DIMVA.

[8]  Andrew Walenstein,et al.  Exploiting Similarity Between Variants to Defeat Malware “ Vilo ” Method for Comparing and Searching Binary Programs , 2007 .

[9]  Debin Gao,et al.  BinHunt: Automatically Finding Semantic Differences in Binary Programs , 2008, ICICS.

[10]  Marius Gheorghescu AN AUTOMATED VIRUS CLASSIFICATION SYSTEM , 2006 .

[11]  Shamil Kurmangaleev,et al.  Platform-independent and scalable tool for binary code clone detection , 2016 .

[12]  Gran Vía,et al.  GRAPHS, ENTROPY AND GRID COMPUTING: AUTOMATIC COMPARISON OF MALWARE , 2008 .

[13]  Priya Narasimhan,et al.  Binary Function Clustering Using Semantic Hashes , 2012, 2012 11th International Conference on Machine Learning and Applications.

[14]  Debin Gao,et al.  iBinHunt: Binary Hunting with Inter-procedural Control Flow , 2012, ICISC.

[15]  Arun Lakhotia,et al.  Fast location of similar code fragments using semantic 'juice' , 2013, PPREW '13.

[16]  Thomas Dullien,et al.  Automated Attacker Correlation for Malicious Code , 2010 .