Automatic validation for binary translation

Binary translation is an important technique for porting programs as it allows binary code for one platform to execute on another. It is widely used in virtual machines and emulators. However, implementing a correct (and efficient) binary translator is still very challenging because many delicate details must be handled smartly. Manually identifying mistranslated instructions in an application program is difficult, especially when the application is large. Therefore, automatic validation tools are needed urgently to uncover hidden problems in a binary translator. We developed a new validation tool for binary translators. In our validation tool, the original binary code and the translated binary code run simultaneously. Both versions of the binary code continuously send their architecture states and the stored values, which are the values stored into memory cells, to a third process, the validator. Since most mistranslated instructions will result in wrong architecture states during execution, our validator can catch most mistranslated instructions emitted by a binary translator by comparing the corresponding architecture states. Corresponding architecture states may differ due to (1) translation errors, (2) different (but correct) memory layouts, and (3) return values of certain system calls. The need to differentiate the three sources of differences makes comparing architecture states very difficult, if not impossible. In our validator, we take special care to make memory layouts exactly the same and make the corresponding system calls always return exactly the same values in the original and in the translated binaries. Therefore, any differences in the corresponding architecture states indicate mistranslated instructions emitted by the binary translator. Besides solving the architecture-state-comparison problems, we also propose several methods to speed up the automatic validation. The first is the validation-block method, which reduces the number of validations while keeping the accuracy of instruction-level validation. The second is quick validation, which provides extremely fast validation at the expense of less accurate error information. Our validator can be applied to different binary translators. In our experiment, the validator has successfully validated programs translated by static, dynamic, and hybrid binary translators. HighlightsAn automatic validator supports static, dynamic, and hybrid binary translator.Instruction-level validation by comparing the architecture states and stored values.We propose two mechanisms to make the comparisons into simple equality checks.Two acceleration method provided to make the validation process faster.

[1]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[2]  Wuu Yang,et al.  LLBT: an LLVM-based static binary translator , 2012, CASES '12.

[3]  Yuan-Shin Hwang,et al.  DisIRer: Converting a retargetable compiler into a multiplatform binary translator , 2010, TACO.

[4]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[5]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[6]  Cindy Zheng,et al.  PA-RISC to IA-64: Transparent Execution, No Recompilation , 2000, Computer.

[7]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[8]  Cristina Cifuentes,et al.  UQBT: Adaptive Binary Translation at Low Cost , 2000, Computer.

[9]  John Yates,et al.  FX!32 a profile-directed binary translator , 1998, IEEE Micro.

[10]  George C. Necula,et al.  Translation validation for an optimizing compiler , 2000, PLDI '00.

[11]  Alex Groce,et al.  Taming compiler fuzzers , 2013, PLDI.

[12]  Wuu Yang,et al.  An LLVM-based hybrid binary translation system , 2012, 7th IEEE International Symposium on Industrial Embedded Systems (SIES'12).

[13]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[14]  Alexander Aiken,et al.  Binary Translation Using Peephole Superoptimizers , 2008, OSDI.

[15]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[16]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[17]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[18]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[19]  Xuejun Yang,et al.  Test-case reduction for C compiler bugs , 2012, PLDI.

[20]  David J. Lilja,et al.  Automatic verification of instruction set simulation using synchronized state comparison , 2001, Proceedings. 34th Annual Simulation Symposium.

[21]  J. Gregory Morrisett,et al.  Evaluating value-graph translation validation for LLVM , 2011, PLDI '11.