Labeling library functions in stripped binaries

Binary code presents unique analysis challenges, particularly when debugging information has been stripped from the executable. Among the valuable information lost in stripping are the identities of standard library functions linked into the executable; knowing the identities of such functions can help to optimize automated analysis and is instrumental in understanding program behavior. Library fingerprinting attempts to restore the names of library functions in stripped binaries, using signatures extracted from reference libraries. Existing methods are brittle in the face of variations in the toolchain that produced the reference libraries and do not generalize well to new library versions. We introduce semantic descriptors, high-level representations of library functions that avoid the brittleness of existing approaches. We have extended a tool, unstrip, to apply this technique to fingerprint wrapper functions in the GNU C library. unstrip discovers functions in a stripped binary and outputs a new binary, with meaningful names added to the symbol table. Other tools can leverage these symbols to perform further analysis. We demonstrate that our semantic descriptors generalize well and substantially outperform existing library fingerprinting techniques.

[1]  Tibor Gyimóthy,et al.  Interprocedural static slicing of binary executables , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[2]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[3]  Halvar Flake,et al.  Structural Comparison of Executable Objects , 2004, DIMVA.

[4]  Mike Emmerik Signatures for Library Functions in Executable Files , 1994 .

[5]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[6]  Thomas E. Cheatham,et al.  Symbolic Evaluation and the Analysis of Programs , 1979, IEEE Transactions on Software Engineering.

[7]  Cristina Cifuentes,et al.  Intraprocedural static slicing of binary executables , 1997, 1997 Proceedings International Conference on Software Maintenance.

[8]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[9]  Barton P. Miller,et al.  Extracting compiler provenance from program binaries , 2010, PASTE '10.

[10]  P. David Coward Symbolic execution systems-a review , 1988, Softw. Eng. J..

[11]  Cristina Cifuentes,et al.  Decompilation of binary programs , 1995, Softw. Pract. Exp..

[12]  Henrik Theiling,et al.  Extracting safe and precise control flow from binaries , 2000, Proceedings Seventh International Conference on Real-Time Computing Systems and Applications.

[13]  Thomas W. Reps,et al.  WYSINWYX: What You See Is Not What You eXecute , 2005, VSTTE.

[14]  Barton P. Miller,et al.  Learning to Analyze Binary Computer Code , 2008, AAAI.

[15]  Somesh Jha,et al.  Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors , 2010, 2010 IEEE Symposium on Security and Privacy.

[16]  Thomas Dullien,et al.  Graph-based comparison of Executable Objects , 2005 .