Re-engineering from legacy executable (binary) files is greatly facilitated by identifying and naming statically linked library functions. This paper presents an efficient method for generating files of patterns; each pattern is a transformation of the first several bytes of a library function's executable code. Given a suitable pattern file, a candidate function can be identified in linear time. One pattern file is generated for each combination of compiler vendor, version and memory model (where applicable). The process of identifying these parameters in a given executable file also identifies the main function of the program, i.e. the start of the code written by the user. The pattern files are produced automatically from a compiler's library file in a few seconds, with no user intervention required. Due to various limitations, not all library functions can be identified correctly; a small number will be either incorrectly identified or not identified. Optimal perfect hash functions are used to keep the pattern files compact and efficient to process.
[1]
George Havas,et al.
Perfect Hashing
,
1997,
Theor. Comput. Sci..
[2]
George Havas,et al.
An Optimal Algorithm for Generating Minimal Perfect Hash Functions
,
1992,
Inf. Process. Lett..
[3]
Liu Zongtian,et al.
Design and Implementation Techniques of the 8086 C Decompiling System
,
1995
.
[4]
Rajiv Gupta,et al.
On randomization in sequential and distributed algorithms
,
1994,
CSUR.
[5]
Cristina Cifuentes,et al.
Decompilation of binary programs
,
1995,
Softw. Pract. Exp..
[6]
Jürgen Ebert.
A versatile data structure for edge-oriented graph algorithms
,
1987,
CACM.