Balancing memory and performance through selective flushing of software code caches

Dynamic binary translators (DBTs) are becoming increasingly important because of their power and flexibility. However, the high memory demands of DBTs present an obstacle for all platforms, and especially embedded systems. The memory demand is typically controlled by placing a limit on cached translations and forcing the DBT to flush all translations upon reaching the limit. This solution manifests as a performance inefficiency because many flushed translations require retranslation. Ideally, translations should be selectively flushed to minimize retranslations for a given memory limit. However, three obstacles exist:(1) it is difficult to predict which selections will minimize retranslation,(2) selective flushing results in greater book-keeping overheads than full flushing, and(3) the emergence of multicore processors and multi-threaded programming complicates most flushing algorithms. These issues have led to the widespread adoption of full flushing as a standard protocol. In this paper, we present a partial flushing approach aimed at reducing retranslation overhead and improving overall performance, given a fixed memory budget. Our technique applies uniformly to single-threaded and multi-threaded guest applications

[1]  Bruce R. Childers,et al.  Compact binaries with code compression in a software dynamic translator , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[2]  Derek Bruening,et al.  Thread-shared software code caches , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[3]  Mary Lou Soffa,et al.  Planning for code buffer management in distributed virtual execution environments , 2005, VEE '05.

[4]  Bruce R. Childers,et al.  Heterogeneous code cache: Using scratchpad and main memory in dynamic binary translators , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[5]  Chandra Krintz,et al.  Adaptive code unloading for resource-constrained JVMs , 2004, LCTES '04.

[6]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[7]  Derek Bruening,et al.  Secure Execution via Program Shepherding , 2002, USENIX Security Symposium.

[8]  Michael D. Smith,et al.  Managing bounded code caches in dynamic binary optimization systems , 2006, TACO.

[9]  Sang Lyul Min,et al.  Languages, Compilers, and Tools for Embedded Systems , 2001, Lecture Notes in Computer Science.

[10]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[11]  Apala Guha,et al.  DBT path selection for holistic memory efficiency and performance , 2010, VEE '10.

[12]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[13]  Jack W. Davidson,et al.  Evaluating fragment construction policies for SDT systems , 2006, VEE '06.

[14]  Kim M. Hazelwood,et al.  A dynamic binary instrumentation engine for the ARM architecture , 2006, CASES '06.

[15]  Jack W. Davidson,et al.  Fragment cache management for dynamic binary translators in embedded systems with scratchpad , 2007, CASES '07.

[16]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[17]  Paolo Faraboschi,et al.  DELI: a new run-time control point , 2002, MICRO.

[18]  Jack W. Davidson,et al.  Reducing pressure in bounded DBT code caches , 2008, CASES '08.

[19]  Vasanth Bala,et al.  Dynamo: a transparent dynamic optimization system , 2000, SIGP.

[20]  Amer Diwan,et al.  When to use a compilation service? , 2002, LCTES/SCOPES '02.

[21]  Derek Bruening,et al.  Maintaining consistency and bounding capacity of software code caches , 2005, International Symposium on Code Generation and Optimization.

[22]  Mary Lou Soffa,et al.  Retargetable and reconfigurable software dynamic translation , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[23]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[24]  Kim M. Hazelwood,et al.  Scalable support for multithreaded applications on dynamic binary instrumentation systems , 2009, ISMM '09.

[25]  Michael D. Smith,et al.  Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[26]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[27]  Jack W. Davidson,et al.  Addressing the challenges of DBT for the ARM architecture , 2009, LCTES '09.

[28]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[29]  Apala Guha,et al.  Reducing Exit Stub Memory Consumption in Code Caches , 2007, HiPEAC.