Branch prediction and the performance of interpreters — Don't trust folklore

Interpreters have been used in many contexts. They provide portability and ease of development at the expense of performance. The literature of the past decade covers analysis of why interpreters are slow, and many software techniques to improve them. A large proportion of these works focuses on the dispatch loop, and in particular on the implementation of the switch statement: typically an indirect branch instruction. Folklore attributes a significant penalty to this branch, due to its high misprediction rate. We revisit this assumption, considering state-of-the-art branch predictors and the three most recent Intel processor generations on current interpreters. Using both hardware counters on Has well, the latest Intel processor generation, and simulation of the IT-TAGE, we show that the accuracy of indirect branch prediction is no longer critical for interpreters. We further compare the characteristics of these interpreters and analyze why the indirect branch is less important than before.

[1]  David Gregg,et al.  The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures , 2001, Euro-Par.

[2]  Tarek S. Abdelrahman,et al.  Catenation and specialization for Tcl virtual machine performance , 2004, IVME '04.

[3]  Robert Wilson,et al.  Compiling Java just in time , 1997, IEEE Micro.

[4]  David Gregg,et al.  Virtual machine showdown: stack versus registers , 2005, VEE '05.

[5]  M. Anton Ertl,et al.  Stack caching for interpreters , 1995, PLDI '95.

[6]  Bart Vandenbussche,et al.  The HERSCHEL/PACS common software system as data reduction system , 2004 .

[7]  Yale N. Patt,et al.  A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[8]  David Gregg,et al.  Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions , 2012, TACO.

[9]  Yale N. Patt,et al.  The agree predictor: a mechanism for reducing negative branch history interference , 1997, ISCA '97.

[10]  Pierre Michaud,et al.  A case for (partially) TAgged GEometric history length branch prediction , 2006, J. Instr. Level Parallelism.

[11]  Yale N. Patt,et al.  Target prediction for indirect jumps , 1997, ISCA '97.

[12]  Philippe Canal,et al.  The role of interpreters in high performance computing , 2008 .

[13]  Angela Demke Brown,et al.  Context threading: a flexible and efficient dispatch technique for virtual machine interpreters , 2005, International Symposium on Code Generation and Optimization.

[14]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[15]  André Seznec,et al.  A new case for the TAGE branch predictor , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[17]  Erven Rohou,et al.  Tiptop: Hardware Performance Counters for the Masses , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[18]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[19]  David Gregg,et al.  Combining stack caching with dynamic superinstructions , 2004, IVME '04.

[20]  Yiannakis Sazeides,et al.  Design tradeoffs for the Alpha EV8 conditional branch predictor , 2002, ISCA.

[21]  David Gregg,et al.  Optimizing indirect branch prediction accuracy in virtual machine interpreters , 2003, PLDI '03.

[22]  A. Seznec,et al.  Trading Conflict And Capacity Aliasing In Conditional Branch Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[23]  James R. Bell,et al.  Threaded code , 1973, CACM.

[24]  Trevor N. Mudge,et al.  The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[25]  Jack W. Davidson,et al.  Cint: a RISC interpreter for the C programming language , 1987, SIGPLAN '87.

[26]  Jan Vitek,et al.  An analysis of the dynamic behavior of JavaScript programs , 2010, PLDI '10.

[27]  Per Larsen,et al.  Efficient interpreter optimizations for the JVM , 2013, PPPJ.

[28]  André Seznec A 64-Kbytes ITTAGE indirect branch predictor , 2011 .

[29]  Karel Driesen,et al.  Accurate indirect branch prediction , 1998, ISCA.

[30]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[31]  Stefan Brunthaler Virtual-Machine Abstraction and Optimization Techniques , 2009, Electron. Notes Theor. Comput. Sci..

[32]  Ian Piumarta,et al.  Optimizing direct threaded code by selective inlining , 1998, PLDI 1998.

[33]  S. McFarling Combining Branch Predictors , 1993 .

[34]  Vincent M. Weaver,et al.  Can Hardware Performance Counters Produce Expected, Deterministic Results? , 2010 .

[35]  André Seznec,et al.  Analysis of the O-GEometric history length branch predictor , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).