YETI: a graduallY extensible trace interpreter

The design of new programming languages benefits from interpretation, which can provide a simple initial implementation, flexibility to explore new language features, and portability to many platforms. The only downside is speed of execution, as there remains a large performance gap between even efficient interpreters and mixed-mode systems that include a just-in-time compiler (or JIT for short). Augmenting an interpreter with a JIT, however, is not a small task. Today, JITs used for Java™ are loosely-coupled with the interpreter, with callsites of methods being the only transition point between interpreted and native code. To compile whole methods, the JIT must duplicate a sizable amount of functionality already provided by the interpreter, leading to a "big bang" development effort before the JIT can be deployed. Instead, adding a JIT to an interpreter would be easier if it were possible to leverage the existing functionality. In earlier work we showed that packaging virtual instructions as lightweight callable routines is an efficient way to build an interpreter. In this paper we describe how callable bodies help our interpreter to efficiently identify and run traces. Our closely coupled dynamic compiler can fall back on the interpreter in various ways, permitting an incremental approach in which additional performance gains can be realized as it is extended in two dimensions: (i) generating code for more types of virtual instructions, and (ii) identifying larger compilation units. Currently, Yeti identifies straight line regions of code and traces, and generates non-optimized code for roughly 50 Java integer and object bytecodes. Yeti runs roughly twice as fast as a direct-threaded interpreter on SPECjvm98 benchmarks.

[1]  M. Anton Ertl,et al.  Stack caching for interpreters , 1995, PLDI '95.

[2]  Markus Mock,et al.  A retrospective on: "an evaluation of staged run-time optimizations in DyC" , 2004, SIGP.

[3]  L. Peter Deutsch,et al.  Efficient implementation of the smalltalk-80 system , 1984, POPL.

[4]  Emmanuel Chailloux,et al.  Objective Caml : développment d'applications avec , 2000 .

[5]  Woody Lichtenstein,et al.  The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[6]  Brian N. Bershad,et al.  Fast, effective dynamic compilation , 1996, PLDI '96.

[8]  Christopher A. Vick,et al.  The Java HotSpotTM Server Compiler , 2001 .

[9]  Markus Mock,et al.  DyC: an expressive annotation-directed dynamic compiler for C , 2000, Theor. Comput. Sci..

[10]  Vivek Sarkar,et al.  Dynamic optimistic interprocedural analysis: a framework and an application , 2001, OOPSLA '01.

[11]  Toshiaki Yasue,et al.  A region-based compilation technique for dynamic compilers , 2006, TOPL.

[12]  Tarek S. Abdelrahman,et al.  Catenation and operand specialization for Tcl VM performance , 2004 .

[13]  Toshiaki Yasue,et al.  Overview of the IBM Java Just-in-Time Compiler , 2000, IBM Syst. J..

[14]  Cliff Click,et al.  The java hotspot TM server compiler , 2001 .

[15]  Alec Wolman,et al.  The structure and performance of interpreters , 1996, ASPLOS VII.

[16]  E. Duesterwald,et al.  Software profiling for hot path prediction: less is more , 2000, SIGP.

[17]  Stephen J. Fink,et al.  Design, implementation and evaluation of adaptive recompilation with on-stack replacement , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[18]  Evelyn Duesterwald,et al.  Exploring optimal compilation unit shapes for an embedded just-in-time compiler , 2000 .

[19]  David Gregg,et al.  The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures , 2001, Euro-Par.

[20]  James E. Smith,et al.  The architecture of virtual machines , 2005, Computer.

[21]  David Gregg,et al.  Vmgen—a generator of efficient virtual machine interpreters , 2002, Softw. Pract. Exp..

[22]  Tarek S. Abdelrahman,et al.  Catenation and specialization for Tcl virtual machine performance , 2004, IVME '04.

[23]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[24]  Adele Goldberg,et al.  Smalltalk-80 - the interactive programming environment , 1984 .

[25]  Urs Hölzle,et al.  Adaptive optimization for self: reconciling high performance with exploratory programming , 1994 .

[26]  Evelyn Duesterwald,et al.  Design and implementation of a dynamic optimization framework for windows , 2000 .

[27]  Urs Hölzle,et al.  A third-generation SELF implementation: reconciling responsiveness with performance , 1994, OOPSLA '94.

[28]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[29]  Saman Amarasinghe,et al.  Dynamic native optimization of interpreters , 2003, IVME '03.

[30]  David Gregg,et al.  Optimizing indirect branch prediction accuracy in virtual machine interpreters , 2003, PLDI '03.

[31]  Randy Clark,et al.  Ucsd Pascal Handbook , 1982 .

[32]  Ken Thompson,et al.  Programming Techniques: Regular expression search algorithm , 1968, Commun. ACM.

[33]  David Robson,et al.  Smalltalk-80: The Language and Its Implementation , 1983 .

[34]  Peter M. Kogge,et al.  An Architectural Trail to Threaded-Code Systems , 1982, Computer.

[35]  Richard Johnson,et al.  The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.

[36]  Henry S. Warren,et al.  Instruction Scheduling for the IBM RISC System/6000 Processor , 1990, IBM J. Res. Dev..

[37]  David Grove,et al.  Adaptive online context-sensitive inlining , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[38]  Urs Hölzle,et al.  A third-generation SELF implementation: reconciling responsiveness with performance , 1994, OOPSLA 1994.

[39]  Robert Wilson,et al.  Compiling Java just in time , 1997, IEEE Micro.

[40]  Karel Driesen Efficient Polymorphic Calls , 2001 .

[41]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[42]  Laurie J. Hendren,et al.  Effective Inline-Threaded Interpretation of Java Bytecode Using Preparation Sequences , 2003, CC.

[43]  Angela Demke Brown,et al.  Context threading: a flexible and efficient dispatch technique for virtual machine interpreters , 2005, International Symposium on Code Generation and Optimization.

[44]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[45]  Daryl Maier,et al.  Experiences with multi-threading and dynamic class loading in a Java just-in-time compiler , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[46]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[47]  Bell Telephone,et al.  Regular Expression Search Algorithm , 1968 .

[48]  Michael D. Smith,et al.  Improving region selection in dynamic optimization systems , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[49]  Michael Stumm,et al.  Online performance analysis by statistical sampling of microprocessor performance counters , 2005, ICS '05.

[50]  Angela Demke Brown,et al.  Mixed mode execution with context threading , 2005, CASCON.

[51]  Craig Chambers,et al.  Object, message, and performance: how they coexist in Self , 1992, Computer.

[52]  David Gregg,et al.  Optimizing indirect branch prediction accuracy in virtual machine interpreters , 2007, ACM Trans. Program. Lang. Syst..

[53]  Eric Allman A Conversation with James Gosling , 2004 .

[54]  Karel Driesen Software Techniques for Efficient Polymorphic Calls , 2001 .

[55]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[56]  Rob Pike,et al.  Hardware/software trade‐offs for bitmap graphics on the blit , 1985, Softw. Pract. Exp..

[57]  Craig Chambers,et al.  The design and implementation of the self compiler, an optimizing compiler for object-oriented programming languages , 1992 .

[58]  Michael Franz,et al.  HotpathVM: an effective JIT compiler for resource-constrained devices , 2006, VEE '06.

[59]  Craig Chambers,et al.  Debugging optimized code with dynamic deoptimization , 1992, PLDI '92.

[60]  Sorin Lerner,et al.  Mojo: A Dynamic Optimization System , 2000 .

[61]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[62]  John Whaley Partial method compilation using dynamic profile information , 2001, OOPSLA '01.

[63]  Ian Piumarta,et al.  Optimizing direct threaded code by selective inlining , 1998, PLDI 1998.

[64]  Vasanth Bala,et al.  Transparent Dynamic Optimization: The Design and Implementation of Dynamo , 1999 .

[65]  Iris Baron,et al.  Dynamic optimization of interpreters using DynamoRIO , 2003 .

[66]  Ian Piumarta,et al.  The Virtual Processor: Fast, Architecture-Neutral Dynamic Code Generation , 2004, Virtual Machine Research and Technology Symposium.

[67]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[68]  Stephen Gilmore,et al.  Programming in Standard ML '97: A Tutorial Introduction , 1997 .

[69]  David Grove,et al.  A framework for call graph construction algorithms , 2001, TOPL.

[70]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[71]  Michal Revucky Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters , 2007 .

[72]  Peter Lee,et al.  Optimizing ML with run-time code generation , 1996, PLDI '96.

[73]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[74]  Ronald L. Johnston The Dynamic Incremental Compiler of APL\3000 , 1979 .

[75]  James R. Bell,et al.  Threaded code , 1973, CACM.

[76]  Michael D. Smith,et al.  Code cache management schemes for dynamic optimizers , 2002, Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures.

[77]  Paolo Faraboschi,et al.  Instruction scheduling for instruction level parallel processors , 2001, Proc. IEEE.

[78]  Markku Rossi,et al.  A Survey of Instruction Dispatch Techniques for Byte-Code Interpreters , 1996 .