Fine-grained binary code authorship identification

Binary code authorship identification is the task of determining the authors of a piece of binary code from a set of known authors. Modern software often contains code from multiple authors. However, existing techniques assume that each program binary is written by a single author. We present a new finer-grained technique to the tougher problem of determining the author of each basic block. Our evaluation shows that our new technique can discriminate the author of a basic block with 52% accuracy among 282 authors, as opposed to 0.4% accuracy by random guess, and it provides a practical solution for identifying multiple authors in software.

[1]  Barton P. Miller,et al.  Who Wrote This Code? Identifying the Authors of Program Binaries , 2011, ESORICS.

[2]  Barton P. Miller,et al.  Mining Software Repositories for Accurate Authorship , 2013, 2013 IEEE International Conference on Software Maintenance.

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  Matthew F. Tennyson On Improving Authorship Attribution of Source Code , 2012, ICDF2C.

[5]  Arvind Narayanan,et al.  When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries , 2015, NDSS.

[6]  Efstathios Stamatatos,et al.  Author Identification in Imbalanced Sets of Source Code Samples , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[7]  Spiros Mancoridis,et al.  Using code metric histograms and genetic algorithms to perform author identification for software forensics , 2007, GECCO '07.

[8]  Fernando De La Cuadra Malware: The geneology of malware , 2007 .

[9]  Thomas J. Holt,et al.  Examining the social networks of malware writers and hackers , 2012 .

[10]  Arun Lakhotia,et al.  Identifying Shared Software Components to Support Malware Forensics , 2014, DIMVA.

[11]  Arvind Narayanan,et al.  De-anonymizing Programmers via Code Stylometry , 2015, USENIX Security Symposium.

[12]  Steven David,et al.  Source Code Authorship Attribution , 2010 .

[13]  Stefano Zanero,et al.  Lines of malicious code: insights into the malicious software industry , 2012, ACSAC '12.

[14]  Barton P. Miller,et al.  Recovering the toolchain provenance of binary code , 2011, ISSTA '11.

[15]  Hsinchun Chen,et al.  Descriptive Analytics: Examining Expert Hackers in Web Forums , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[16]  Stefanos Gritzalis,et al.  Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method , 2007, Int. J. Digit. EVid..

[17]  Lingyu Wang,et al.  OBA2: An Onion approach to Binary code Authorship Attribution , 2014, Digit. Investig..

[18]  Raymond Roberts MALWARE DEVELOPMENT LIFE CYCLE , 2008 .

[19]  Richard M. Stallman,et al.  Using the GNU Compiler Collection , 2010 .

[20]  Lingyu Wang,et al.  BinComp: A stratified approach to compiler provenance Attribution , 2015, Digit. Investig..

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Victor A. Benjamin,et al.  Securing cyberspace: Identifying key actors in hacker communities , 2012, 2012 IEEE International Conference on Intelligence and Security Informatics.