Code Placement for Improving Dynamic Branch Prediction Accuracy

Code placement techniques have traditionally improved instruction fetch bandwidth by increasing instruction locality and decreasing the number of taken branches. However, traditional code placement techniques have less benet in the presence of a trace cache that alters the placement of instructions in the instruction cache. Moreover, as pipelines have become deeper to accommodate increasing clock rates, branch misprediction penalties have become a significant impediment to performance. We evaluate pattern history table partitioning, a feedback directed code placement technique that explicitly places conditional branches so that they are less likely to interfere destructively with one another in branch prediction tables. On SPEC CPU benchmarks running on an Intel Pentium 4, branch mispredictions are reduced by up to 22% and 3.5% on average. This reduction yields a speedup of up to 16.0% and 4.5% on average. By contrast, branch alignment, a previous code placement technique, yields only up to a 4.7% speedup and less than 1% on average.

[1]  Trevor N. Mudge,et al.  The YAGS branch prediction scheme , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[3]  Aleksandar Milenkovic,et al.  Microbenchmarks for determining branch predictor organization , 2004 .

[4]  James R. Larus,et al.  Branch prediction for free , 1993, PLDI '93.

[5]  Y.N. Patt,et al.  Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in the Presence of Context Switches , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[6]  Chung-Ta King,et al.  Walk-Time Address Adjustment for Improving the Accuracy of Dynamic Branch Prediction , 1999, IEEE Trans. Computers.

[7]  Yale N. Patt,et al.  A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[8]  Yale N. Patt,et al.  The agree predictor: a mechanism for reducing negative branch history interference , 1997, ISCA '97.

[9]  S. McFarling Program optimization for instruction caches , 1989, ASPLOS 1989.

[10]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[11]  Gary S. Tyson,et al.  Analyzing the working set characteristics of branch execution , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[12]  Michael D. Smith,et al.  Procedure placement using temporal-ordering information , 1999, TOPL.

[13]  Dirk Grunwald,et al.  Evidence-based static branch prediction using machine learning , 1997, TOPL.

[14]  Harish Patil,et al.  Combining static and dynamic branch prediction to reduce destructive aliasing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[15]  Trevor N. Mudge,et al.  The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[16]  Michael D. Smith,et al.  Static correlated branch prediction , 1999, TOPL.

[17]  Dirk Grunwald,et al.  Reducing branch costs via branch alignment , 1994, ASPLOS VI.

[18]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[19]  Donald J. Hatfield,et al.  Program Restructuring for Virtual Memory , 1971, IBM Syst. J..

[20]  Yale N. Patt,et al.  Branch classification: a new mechanism for improving branch predictor performance , 1994, MICRO.

[21]  Daniel A. Jiménez,et al.  The impact of delay on the design of branch predictors , 2000, MICRO 33.

[22]  Mateo Valero,et al.  The effect of code reordering on branch prediction , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).