Compiler Provenance Attribution

Compiler identification is an essential component of binary toolchain analysis with a multitude of applications in reverse engineering and malware analysis. Security investigators and cyber incident responders are often tasked with the analysis and attribution of binary files obtained from malicious campaigns which need to be inspected quickly and reliably. Such binaries can be a source of intelligence on adversary tactics, techniques, and procedures. Compiler provenance information can aid binary analysis by uncovering fingerprints of the development environment and related libraries, leading to an accelerated analysis. In this chapter, we present BinComp, which provides a practical approach for analyzing the syntax, structure, and semantics of disassembled functions to extract compiler provenance.

[1]  Zaharije Radivojevic,et al.  Approach for estimating similarity between procedures in differently compiled binaries , 2015, Inf. Softw. Technol..

[2]  Lingyu Wang,et al.  OBA2: An Onion approach to Binary code Authorship Attribution , 2014, Digit. Investig..

[3]  Konrad Rieck,et al.  Structural detection of android malware using embedded call graphs , 2013, AISec.

[4]  Barton P. Miller,et al.  Recovering the toolchain provenance of binary code , 2011, ISSTA '11.

[5]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[6]  Hisashi Kashima,et al.  A Linear-Time Graph Kernel , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7]  Mourad Debbabi,et al.  RESource: A Framework for Online Matching of Assembly with Open Source Code , 2012, FPS.

[8]  Barton P. Miller,et al.  Extracting compiler provenance from program binaries , 2010, PASTE '10.

[9]  Mourad Debbabi,et al.  BinSign: Fingerprinting Binary Functions to Support Automated Analysis of Code Executables , 2017, SEC.

[10]  Arun Lakhotia,et al.  Identifying Shared Software Components to Support Malware Forensics , 2014, DIMVA.

[11]  Mark Stamp,et al.  Chi-squared distance and metamorphic virus detection , 2013, Journal of Computer Virology and Hacking Techniques.

[12]  Barton P. Miller,et al.  Labeling library functions in stripped binaries , 2011, PASTE '11.

[13]  Barton P. Miller,et al.  Who Wrote This Code? Identifying the Authors of Program Binaries , 2011, ESORICS.

[14]  Björn Franke,et al.  Exploiting function similarity for code size reduction , 2014, LCTES '14.

[15]  Benjamin C. M. Fung,et al.  BinClone: Detecting Code Clones in Malware , 2014, 2014 Eighth International Conference on Software Security and Reliability.

[16]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[17]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[18]  Lingyu Wang,et al.  SIGMA: A Semantic Integrated Graph Matching Approach for identifying reused functions in binary code , 2015, Digit. Investig..