VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization

Indirect branches have become increasingly common in modular programs written in modern object-oriented languages and virtual machine based runtime systems. Unfortunately, the prediction accuracy of indirect branches has not improved as much as that of conditional branches. Furthermore, previously proposed indirect branch predictors usually require a significant amount of extra hardware storage and complexity, which makes them less attractive to implement. This paper proposes a new technique for handling indirect branches, called Virtual Program Counter (VPC) prediction. The key idea of VPC prediction is to treat a single indirect branch as multiple virtual conditional branches in hardware for prediction purposes. Our technique predicts each of the virtual conditional branches using the existing conditional branch prediction hardware. Thus, no separate storage structure is required for predicting indirect branch targets. Our evaluation shows that VPC prediction improves average performance by 26.7% compared to a commonly-used branch target buffer based predictor on 12 indirect branch intensive applications. VPC prediction achieves the performance improvement provided by at least a 12KB (and usually a 192KB) tagged target cache predictor on half of the examined applications. We show that VPC prediction can be used with any existing conditional branch prediction mechanism and that the accuracy of VPC prediction improves when a more accurate conditional branch predictor is used.

[1]  Onur Mutlu,et al.  Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[2]  Karel Driesen,et al.  Accurate indirect branch prediction , 1998, ISCA.

[3]  André Seznec,et al.  Analysis of the O-GEometric history length branch predictor , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[4]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[5]  Toshiaki Yasue,et al.  A study of devirtualization techniques for a Java Just-In-Time compiler , 2000, OOPSLA '00.

[6]  Onur Mutlu,et al.  Dynamic Predication of Indirect Jumps , 2007, IEEE Computer Architecture Letters.

[7]  Pierre Michaud,et al.  A case for (partially) TAgged GEometric history length branch prediction , 2006, J. Instr. Level Parallelism.

[8]  Yale N. Patt,et al.  Target prediction for indirect jumps , 1997, ISCA '97.

[9]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[10]  David R. Kaeli,et al.  Branch History Table Prediction of Moving Target Branches due to Subroutine Returns , 1991, ISCA.

[11]  Dirk Grunwald,et al.  Reducing indirect function call overhead in C++ programs , 1994, POPL '94.

[12]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[13]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  Yale N. Patt,et al.  Improving branch prediction accuracy by reducing pattern history table interference , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[15]  David R. Kaeli,et al.  Predicting indirect branches via data compression , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[16]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[17]  R. D. Valentine,et al.  The Intel Pentium M processor: Microarchitecture and performance , 2003 .

[18]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[19]  Karel Driesen,et al.  Multi-stage Cascaded Prediction , 1999, Euro-Par.

[20]  Luca Cardelli,et al.  On understanding types, data abstraction, and polymorphism , 1985, CSUR.

[21]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[22]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[23]  Andreas Moshovos,et al.  Improving virtual function call target prediction via dependence-based pre-computation , 1999, ICS '99.

[24]  L. Peter Deutsch,et al.  Efficient implementation of the smalltalk-80 system , 1984, POPL.

[25]  Karel Driesen,et al.  The cascaded predictor: economical and adaptive branch target prediction , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[26]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[27]  S. McFarling Combining Branch Predictors , 1993 .

[28]  David Grove,et al.  Measurement and Application of Dynamic Receiver Class Distributions , 2007 .

[29]  WegnerPeter,et al.  On understanding types, data abstraction, and polymorphism , 1985 .

[30]  Craig Chambers,et al.  Optimizing Dynamically-Typed Object-Oriented Languages With Polymorphic Inline Caches , 1991, ECOOP.

[31]  Michal Revucky Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters , 2007 .

[32]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[33]  David Grove,et al.  Profile-guided receiver class prediction , 1995, OOPSLA.

[34]  Timothy J. Slegel,et al.  The IBM eServer z990 microprocessor , 2004, IBM J. Res. Dev..

[35]  Allan Hartstein,et al.  The optimum pipeline depth for a microprocessor , 2002, ISCA.

[36]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[37]  David Gregg,et al.  Optimizing indirect branch prediction accuracy in virtual machine interpreters , 2003, PLDI '03.

[38]  Dirk Grunwald,et al.  Quantifying Behavioral Differences Between C and C++ Programs , 1994 .

[39]  D.R. Kaeli,et al.  Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[40]  Urs Hölzle,et al.  Optimizing dynamically-dispatched calls with run-time type feedback , 1994, PLDI '94.

[41]  Yale N. Patt,et al.  An analysis of correlation and predictability: what makes two-level branch predictors work , 1998, ISCA.

[42]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[43]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.