Static Detection of Disassembly Errors

Static disassembly is a crucial first step in reverse engineering executable files, and there is a considerable body of work in reverse-engineering of binaries, as well as areas such as semantics-based security analysis, that assumes that the input executable has been correctly disassembled. However, disassembly errors, e.g., arising from binary obfuscations, can render this assumption invalid. This work describes a machine-learning-based approach, using decision trees, for statically identifying possible errors in a static disassembly; such potential errors may then be examined more closely, e.g., using dynamic analyses. Experimental results using a variety of input executables indicate that our approach performs well, correctly identifying most disassembly errors with relatively few false positives.

[1]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Lieven Eeckhout,et al.  Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[4]  Gregory R. Andrews,et al.  Disassembly of executable code revisited , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  Saumya K. Debray,et al.  Obfuscation of executable code to improve resistance to static disassembly , 2003, CCS '03.

[7]  Andrew Walenstein,et al.  An Approach towards Disassembly of Malicious Binary Executables , 2005 .

[8]  Daniel Bilar,et al.  Opcodes as predictor for malware , 2007, Int. J. Electron. Secur. Digit. Forensics.

[9]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[10]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[11]  R. Sekar,et al.  On the Limits of Information Flow Techniques for Malware Analysis and Containment , 2008, DIMVA.

[12]  Christopher Krügel,et al.  Static Disassembly of Obfuscated Binaries , 2004, USENIX Security Symposium.

[13]  Gregory R. Andrews,et al.  PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture , 2007 .

[14]  Gregory R. Andrews,et al.  Binary Obfuscation Using Signals , 2007, USENIX Security Symposium.

[15]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[16]  Christopher Krügel,et al.  Detecting kernel-level rootkits through binary analysis , 2004, 20th Annual Computer Security Applications Conference.

[17]  Christopher Krügel,et al.  Exploring Multiple Execution Paths for Malware Analysis , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).