Improving the performance of object-oriented languages with dynamic predication of indirect jumps

Indirect jump instructions are used to implement increasingly-common programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because indirect jumps with multiple targets are difficult to predict even with specialized hardware. This paper proposes a new way of handling hard-to-predict indirect jumps: dynamically predicating them. The compiler (static or dynamic) identifies indirect jumps that are suitable for predication along with their control-flow merge (CFM) points. The hardware predicates theinstructions between different targets of the jump and its CFM point if the jump turns out to be hard-to-predict at run time. If the jump would actually have been mispredicted, its dynamic predication eliminates a pipeline flush, thereby improving performance. Our evaluations show that Dynamic Indirect jump Predication (DIP) improves the performance of a set of object-oriented applications including the Java DaCapo benchmark suite by 37.8% compared to a commonly-used branch target buffer based predictor, while also reducing energy consumption by 24.8%. We compare DIP to three previously proposed indirect jump predictors and find that it provides the best performance and energy-efficiency.

[1]  Onur Mutlu,et al.  Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[2]  David R. Kaeli,et al.  Predicting indirect branches via data compression , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[3]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[4]  Onur Mutlu,et al.  Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication , 2007, IEEE Micro.

[5]  L. Peter Deutsch,et al.  Efficient implementation of the smalltalk-80 system , 1984, POPL.

[6]  Andreas Moshovos,et al.  Improving virtual function call target prediction via dependence-based pre-computation , 1999, ICS '99.

[7]  Michal Revucky Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters , 2007 .

[8]  Bowen Alpern,et al.  Efficient implementation of Java interfaces: Invokeinterface considered harmless , 2001, OOPSLA '01.

[9]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[10]  Luca Cardelli,et al.  On understanding types, data abstraction, and polymorphism , 1985, CSUR.

[11]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[12]  R. D. Valentine,et al.  The Intel Pentium M processor: Microarchitecture and performance , 2003 .

[13]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[14]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[15]  David Gregg,et al.  Optimizing indirect branch prediction accuracy in virtual machine interpreters , 2003, PLDI '03.

[16]  Dirk Grunwald,et al.  Quantifying Behavioral Differences Between C and C++ Programs , 1994 .

[17]  E. Smith,et al.  Selective Dual Path Execution , 1996 .

[18]  Edward M. Riseman,et al.  The Inhibition of Potential Parallelism by Conditional Jumps , 1972, IEEE Transactions on Computers.

[19]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[20]  Dirk Grunwald,et al.  Dynamic hammock predication for non-predicated instruction set architectures , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[21]  Karel Driesen,et al.  Multi-stage Cascaded Prediction , 1999, Euro-Par.

[22]  Toshiaki Yasue,et al.  A study of devirtualization techniques for a Java Just-In-Time compiler , 2000, OOPSLA '00.

[23]  Craig Chambers,et al.  Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches , 1991, ECOOP.

[24]  John Paul Shen,et al.  Register renaming and scheduling for dynamic execution of predicated code , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[25]  Karel Driesen,et al.  Accurate indirect branch prediction , 1998, ISCA.

[26]  Dirk Grunwald,et al.  Selective eager execution on the PolyPath architecture , 1998, ISCA.

[27]  Pierre Michaud,et al.  A case for (partially) TAgged GEometric history length branch prediction , 2006, J. Instr. Level Parallelism.

[28]  Yale N. Patt,et al.  Target prediction for indirect jumps , 1997, ISCA '97.

[29]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[30]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[31]  Onur Mutlu,et al.  VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization , 2007, ISCA '07.

[32]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[33]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[34]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[35]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[36]  Urs Hölzle,et al.  Optimizing dynamically-dispatched calls with run-time type feedback , 1994, PLDI '94.

[37]  Onur Mutlu,et al.  Dynamic Predication of Indirect Jumps , 2007, IEEE Computer Architecture Letters.

[38]  Onur Mutlu,et al.  Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).