Optimizing indirect branch prediction accuracy in virtual machine interpreters

Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers (BTBs) are the most widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2p--50p. In this article we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter build-time) and dynamic (interpreter runtime) variants of these techniques and compare them and several combinations of these techniques. To show their generality, we have implemented these optimizations in VMs for both Java and Forth. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 4.55 over efficient threaded-code interpreters, and speedups by a factor of up to 1.34 over techniques relying on dynamic superinstructions alone.

[1]  Lex Augusteijn,et al.  A code compression system based on pipelined interpreters , 1999 .

[2]  Michael D. Smith,et al.  Improving the accuracy of static branch prediction using branch correlation , 1994, ASPLOS VI.

[3]  David Gregg,et al.  Tiger - An Interpreter Generation Tool , 2005, CC.

[4]  Todd A. Proebsting Optimizing an ANSI C interpreter with superoperators , 1995, POPL '95.

[5]  Etienne Gagnon,et al.  A portable research framework for the execution of java bytecode , 2003 .

[6]  Lizy Kurian John,et al.  Adapting branch-target buffer to improve the target predictability of java code , 2005, TACO.

[7]  Alec Wolman,et al.  The structure and performance of interpreters , 1996, ASPLOS VII.

[8]  Andreas Krall,et al.  Improving semi-static branch prediction by code replication , 1994, PLDI '94.

[9]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[10]  M. Anton Ertl,et al.  Stack caching for interpreters , 1995, PLDI '95.

[11]  David Gregg,et al.  The Structure and Performance of Efficient Interpreters , 2003, J. Instr. Level Parallelism.

[12]  David R. Kaeli,et al.  Indirect Branch Prediction Using Data Compression Techniques , 1999, J. Instr. Level Parallelism.

[13]  Ian Piumarta,et al.  Optimizing direct threaded code by selective inlining , 1998, PLDI 1998.

[14]  David R. Kaeli,et al.  Improving the Accuracy of History Based Branch Prediction , 1997, IEEE Trans. Computers.

[15]  Laurie J. Hendren,et al.  Effective Inline-Threaded Interpretation of Java Bytecode Using Preparation Sequences , 2003, CC.

[16]  Karel Driesen,et al.  Multi-stage Cascaded Prediction , 1999, Euro-Par.

[17]  David Gregg,et al.  Optimizing indirect branch prediction accuracy in virtual machine interpreters , 2003, PLDI '03.

[18]  Dirk Grunwald,et al.  Reducing branch costs via branch alignment , 1994, ASPLOS VI.

[19]  James R. Bell,et al.  Threaded code , 1973, CACM.

[20]  Laurie J. Hendren,et al.  SableVM: A Research Framework for the Efficient Execution of Java Bytecode , 2001, Java Virtual Machine Research and Technology Symposium.

[21]  Angela Demke Brown,et al.  Context threading: a flexible and efficient dispatch technique for virtual machine interpreters , 2005, International Symposium on Code Generation and Optimization.

[22]  Vítor Santos Costa,et al.  Optimising Bytecode Emulation for Prolog , 1999, PPDP.

[23]  Michael D. Smith,et al.  A comparative analysis of schemes for correlated branch prediction , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[24]  Karel Driesen,et al.  Accurate indirect branch prediction , 1998, ISCA.

[25]  David Gregg,et al.  Vmgen—a generator of efficient virtual machine interpreters , 2002, Softw. Pract. Exp..

[26]  David Gregg,et al.  Towards Superinstructions for Java Interpreters , 2003, SCOPES.

[27]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[28]  Lex Augusteijn,et al.  Pipelined Java Virtual Machine Interpreters , 2000, CC.

[29]  Kenneth A. Ross,et al.  Buffering databse operations for enhanced instruction cache performance , 2004, SIGMOD '04.