A Dynamic-Static Combined Code Layout Reorganization Approach for Dynamic Binary Translation

Dynamic binary translation (DBT) has attracted much attention as a powerful technique for the runtime adaptation of software among different ISAs. It offers unprecedented flexibility in the control and modification of a program during the runtime. However, its inherent high overhead has perplexed researchers for many years. In order to reduce the overhead of DBT, this paper presents a dynamic-static combined approach to reorganize the layout of software cache. Under this approach, we first employ an emulating execution to collect the profile information and the translated target code. Especially, the path of execution flow will be tracked. In the static phase, based on the profile information collected in the previous stage, we first use the method of code replicating to build the traces, and then reorganize the layout of the target code by putting the hottest traces at the top of the software cache. Because of exact prediction and improved locality, the execution stream will concentrate on a small area with less control transfer. This approach can greatly reduce the overhead of DBT on the condition that the program runs repeatedly. Experimental results on executing the SPEC 2000 benchmarks show that our approach can reduce more than 30% run time on average.

[1]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[2]  James E. Smith,et al.  Exploring code cache eviction granularities in dynamic optimization systems , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[3]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[4]  Derek Bruening,et al.  Process-shared and persistent code caches , 2008, VEE '08.

[5]  Michael D. Smith,et al.  Improving region selection in dynamic optimization systems , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[6]  Yun Wang,et al.  IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems , 2003, MICRO.

[7]  Derek Bruening,et al.  Maintaining consistency and bounding capacity of software code caches , 2005, International Symposium on Code Generation and Optimization.

[8]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[9]  Wei-Chung Hsu,et al.  COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[10]  Michael D. Smith,et al.  Managing bounded code caches in dynamic binary optimization systems , 2006, TACO.

[11]  Haipeng Deng,et al.  Efficient Online Trace Building Using Code Replication , 2010, 2010 Ninth International Conference on Grid and Cloud Computing.

[12]  Yun Wang,et al.  IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[13]  R. Hookway DIGITAL FX!32 running 32-Bit x86 applications on Alpha NT , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[14]  Apala Guha,et al.  DBT path selection for holistic memory efficiency and performance , 2010, VEE '10.

[15]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[16]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[17]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[18]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[19]  Jianhui Li,et al.  Metadata driven memory optimizations in dynamic binary translator , 2007, VEE '07.

[20]  Youfeng Wu,et al.  The accuracy of initial prediction in two-phase dynamic binary translators , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[21]  Andreas Krall,et al.  Improving semi-static branch prediction by code replication , 1994, PLDI '94.

[22]  Chao Xu,et al.  The Implementation of Static-Integrated Optimization Framework for Dynamic Binary Translation , 2009, 2009 International Conference on Information Technology and Computer Science.

[23]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[24]  Kai Chen,et al.  A New Approach to Reorganize Code Layout of Software Cache in Dynamic Binary Translator , 2010, 2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming.

[25]  Richard M. Stallman,et al.  GNU Compiler Collection Internals , 2011 .

[26]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[27]  Duane Merrill,et al.  Trace fragment selection within method-based JVMs , 2008, VEE '08.

[28]  Apala Guha,et al.  Reducing Exit Stub Memory Consumption in Code Caches , 2007, HiPEAC.

[29]  Jonathan S. Shapiro,et al.  HDTrans: an open source, low-level dynamic instrumentation system , 2006, VEE '06.