Recovery of jump table case statements from binary code

Abstract One of the fundamental problems with the static analysis of binary (executable) code is that of recognizing, in a machine-independent way, the target addresses of n -conditional branches implemented via a jump table. Without these addresses, the decoding of the machine instructions for a given procedure is incomplete, leading to imprecise analysis of the code. In this paper we present a technique for recovering jump tables and their target addresses in a machine and compiler independent way. The technique is based on slicing and copy propagation. The assembly code of a procedure that contains an indexed jump is transformed into a normal form which allows us to determine where the jump table is located and what information it contains (e.g. offsets from the table or absolute addresses). The presented technique has been implemented and tested on SPARC and Pentium code generated by C , C ++ , Fortran and Pascal compilers. Our tests show that up to 89% more of the code in a text segment can be found by using this technique, when compared against the standard method of decoding. The technique was developed as part of our retargetable binary translation framework UQBT; however, it is also suitable for other binary-manipulation and analysis tools such as binary profilers, instrumentors and decompilers.

[1]  James R. Larus,et al.  Optimally profiling and tracing programs , 1994, TOPL.

[2]  John L. Hennessy,et al.  Compilation of the Pascal case statement , 1982, Softw. Pract. Exp..

[3]  Jianhua Sun,et al.  Business rule extraction techniques for COBOL programs , 1998 .

[4]  C. Wrandle Barth Notes on the case statement , 1974, Softw. Pract. Exp..

[5]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[6]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[7]  Robert L. Bernstein Producing good code for the case statement , 1985, Softw. Pract. Exp..

[8]  Christopher W. Fraser,et al.  A retargetable compiler for ANSI C , 1991, SIGP.

[9]  Christopher W. Fraser,et al.  A Retargetable C Compiler: Design and Implementation , 1995 .

[10]  Cristina Cifuentes,et al.  Intraprocedural static slicing of binary executables , 1997, 1997 Proceedings International Conference on Software Maintenance.

[11]  Arthur H. J. Sale The implementation of case statements in Pascal , 1981, Softw. Pract. Exp..

[12]  Cristina Cifuentes,et al.  Decompilation of binary programs , 1995, Softw. Pract. Exp..

[13]  C. A. R. Hoare,et al.  A contribution to the development of ALGOL , 1966, CACM.

[14]  Doug Simon,et al.  Assembly to high-level language translation , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[15]  Shane Sendall,et al.  Specifying the semantics of machine instructions , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[16]  James R. Larus,et al.  Rewriting executable files to measure program behavior , 1994, Softw. Pract. Exp..

[17]  Richard L. Sites,et al.  Binary translation , 1993, CACM.