Memory optimization of dynamic binary translators for embedded systems

Dynamic binary translators (DBTs) are becoming increasingly important because of their power and flexibility. DBT-based services are valuable for all types of platforms. However, the high memory demands of DBTs present an obstacle for embedded systems. Most research on DBT design has a performance focus, which often drives up the DBT memory demand. In this article, we present a memory-oriented approach to DBT design. We consider the class of translation-based DBTs and their sources of memory demand; cached translated code, cached auxiliary code and DBT data structures. We explore aspects of DBT design that impact these memory demand sources and present strategies to mitigate memory demand. We also explore performance optimizations for DBTs that handle memory demand by placing a limit on it, and repeatedly flush translations to stay within the limit, thereby replacing the memory demand problem with a performance degradation problem. Our optimizations that mitigate memory demand improve performance.

[1]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[2]  Jack W. Davidson,et al.  Fragment cache management for dynamic binary translators in embedded systems with scratchpad , 2007, CASES '07.

[3]  Apala Guha,et al.  Balancing memory and performance through selective flushing of software code caches , 2010, CASES '10.

[4]  Jack W. Davidson,et al.  Addressing the challenges of DBT for the ARM architecture , 2009, LCTES '09.

[5]  Mary Lou Soffa,et al.  Retargetable and reconfigurable software dynamic translation , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[6]  Cristina Cifuentes,et al.  Machine-adaptable dynamic binary translation , 2000 .

[7]  Peter Lee,et al.  Optimizing ML with run-time code generation , 1996, PLDI '96.

[8]  Håkan Grahn,et al.  SimICS/Sun4m: A Virtual Workstation , 1998, USENIX Annual Technical Conference.

[9]  David F. Bacon,et al.  Garbage collection for embedded systems , 2004, EMSOFT '04.

[10]  Ali-Reza Adl-Tabatabai,et al.  Fast, effective code generation in a just-in-time Java compiler , 1998, PLDI.

[11]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[12]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[13]  Robert C. Bedichek Talisman: fast and accurate multicomputer simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.

[14]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[15]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[16]  Erik R. Altman,et al.  BOA: Targeting Multi-Gigahertz with Binary Translation , 1999 .

[17]  Ole Agesen,et al.  A comparison of software and hardware techniques for x86 virtualization , 2006, ASPLOS XII.

[18]  Michael D. Smith,et al.  Improving region selection in dynamic optimization systems , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[19]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[20]  Paolo Faraboschi,et al.  DELI: a new run-time control point , 2002, MICRO.

[21]  William R. Bush,et al.  A java virtual machine architecture for very small devices , 2003 .

[22]  Mary Lou Soffa,et al.  Compile-Time Planning for Overhead Reduction in Software Dynamic Translators , 2005, International Journal of Parallel Programming.

[23]  Brian N. Bershad,et al.  Fast, effective dynamic compilation , 1996, PLDI '96.

[24]  Yale N. Patt,et al.  Putting the fill unit to work: dynamic optimizations for trace cache microprocessors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[25]  Michael Franz,et al.  Continuous program optimization: A case study , 2003, TOPL.

[26]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  Robert Muth,et al.  alto: a link‐time optimizer for the Compaq Alpha , 2001 .

[28]  Z. Chen Java Card Technology for Smart Cards: Architecture and Programmer''s Guide. The Java Series. Addis , 2000 .

[29]  Jørgen Lindskov Knudsen,et al.  Compiling java for low-end embedded systems , 2003 .

[30]  Nicholas Nethercote,et al.  How to shadow every byte of memory used by a program , 2007, VEE '07.

[31]  Kim M. Hazelwood,et al.  A dynamic binary instrumentation engine for the ARM architecture , 2006, CASES '06.

[32]  Vasanth Bala,et al.  Dynamo: a transparent dynamic optimization system , 2000, SIGP.

[33]  Vasanth Bala,et al.  Software Profiling for Hot Path Prediction: Less is More , 2000, ASPLOS.

[34]  Jong-Deok Choi,et al.  The Jalape�o Dynamic Optimizing Compiler for JavaTM , 1999, JAVA '99.

[35]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[36]  Dawson R. Engler,et al.  tcc: a system for fast, flexible, and high-level dynamic code generation , 1997, PLDI '97.

[37]  Wei-Chung Hsu,et al.  Continuous Adaptive Object-Code Re-optimization Framework , 2004, Asia-Pacific Computer Systems Architecture Conference.

[38]  Derek Bruening,et al.  Secure Execution via Program Shepherding , 2002, USENIX Security Symposium.

[39]  Sorin Lerner,et al.  Mojo: A Dynamic Optimization System , 2000 .

[40]  Derek Bruening,et al.  Maintaining consistency and bounding capacity of software code caches , 2005, International Symposium on Code Generation and Optimization.

[41]  Mourad Debbabi,et al.  a synergy between efficient interpretation and fast selective dynamic compilation for the acceleration of embedded Java virtual machines , 2004, PPPJ.

[42]  E. Duesterwald,et al.  Software profiling for hot path prediction: less is more , 2000, SIGP.

[43]  Mary Lou Soffa,et al.  Planning for code buffer management in distributed virtual execution environments , 2005, VEE '05.

[44]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[45]  Yun Wang,et al.  IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems , 2003, MICRO.

[46]  Evelyn Duesterwald,et al.  Design and implementation of a dynamic optimization framework for windows , 2000 .

[47]  Dawson R. Engler,et al.  VCODE: a retargetable, extensible, very fast dynamic code generation system , 1996, PLDI '96.

[48]  John Yates,et al.  FX!32 a profile-directed binary translator , 1998, IEEE Micro.

[49]  Yun Wang,et al.  IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[50]  Bruce R. Childers,et al.  Heterogeneous code cache: Using scratchpad and main memory in dynamic binary translators , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[51]  Lizy Kurian John,et al.  More on finding a single number to indicate overall performance of a benchmark suite , 2004, CARN.

[52]  Charles Consel,et al.  Efficient incremental run-time specialization for free , 1999, PLDI '99.

[53]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[54]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[55]  Charles Consel,et al.  A general approach for run-time specialization and its application to C , 1996, POPL '96.

[56]  Bruce R. Childers,et al.  Compact binaries with code compression in a software dynamic translator , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[57]  Eric Traut Building the virtual PC , 1997 .

[58]  John W. Muchow Core J2ME Technology and MIDP , 2001 .

[59]  Nadia Tawbi,et al.  A Dynamic Compiler for an Embedded Java Virtual Machine , 2008 .

[60]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[61]  Anant Agarwal,et al.  Software-based instruction caching for embedded processors , 2006, ASPLOS XII.

[62]  Amer Diwan,et al.  When to use a compilation service? , 2002, LCTES/SCOPES '02.

[63]  Jack W. Davidson,et al.  Evaluating fragment construction policies for SDT systems , 2006, VEE '06.

[64]  Cristina Cifuentes,et al.  Machine-adaptable dynamic binary translation , 2000, Dynamo.

[65]  Apala Guha,et al.  Reducing Exit Stub Memory Consumption in Code Caches , 2007, HiPEAC.

[66]  Cindy Zheng,et al.  PA-RISC to IA-64: Transparent Execution, No Recompilation , 2000, Computer.

[67]  Jack W. Davidson,et al.  Secure and practical defense against code-injection attacks using software dynamic translation , 2006, VEE '06.

[68]  Markus Mock,et al.  A retrospective on: "an evaluation of staged run-time optimizations in DyC" , 2004, SIGP.

[69]  L. Peter Deutsch,et al.  Efficient implementation of the smalltalk-80 system , 1984, POPL.

[70]  Robert S. Cohn,et al.  Hot cold optimization of large Windows/NT applications , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[71]  Jonathan S. Shapiro,et al.  HDTrans: an open source, low-level dynamic instrumentation system , 2006, VEE '06.

[72]  Derek Bruening,et al.  Thread-shared software code caches , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[73]  Vivek Sarkar,et al.  Reducing the overhead of dynamic compilation , 2001, Softw. Pract. Exp..

[74]  Alec Wolman,et al.  Instrumentation and optimization of Win32/intel executables using Etch , 1997 .

[75]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[76]  Michael D. Smith,et al.  Code cache management schemes for dynamic optimizers , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.

[77]  Michael D. Smith,et al.  Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[78]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[79]  Nadia Tawbi,et al.  Armed E-Bunny: a selective dynamic compiler for embedded Java virtual machine targeting ARM processors , 2005, SAC '05.

[80]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[81]  Paolo Faraboschi,et al.  DELI: a new run-time control point , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[82]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeno JVM , 2000, SIGP.

[83]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[84]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[85]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[86]  Saumya K. Debray,et al.  Profile-guided code compression , 2002, PLDI '02.

[87]  Zheng Wang,et al.  System support for automatic profiling and optimization , 1997, SOSP.

[88]  Apala Guha,et al.  DBT path selection for holistic memory efficiency and performance , 2010, VEE '10.

[89]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[90]  Kim M. Hazelwood,et al.  Scalable support for multithreaded applications on dynamic binary instrumentation systems , 2009, ISMM '09.

[91]  Witawas Srisa-an,et al.  An energy efficient garbage collector for java embedded devices , 2005, LCTES '05.

[92]  Jack W. Davidson,et al.  Safe virtual execution using software dynamic translation , 2002, 18th Annual Computer Security Applications Conference, 2002. Proceedings..

[93]  Robert Wilson,et al.  Compiling Java just in time , 1997, IEEE Micro.

[94]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[95]  Michael D. Smith,et al.  Managing bounded code caches in dynamic binary optimization systems , 2006, TACO.

[96]  Jack W. Davidson,et al.  Reducing pressure in bounded DBT code caches , 2008, CASES '08.