Control flow prediction with tree-like subgraphs for superscalar processors

In order to fetch a large number of instructions per cycle, wide-issue superscalar processors have to predict the outcome of multiple branches in a cycle, and fetch instruction blocks from multiple targets. This paper investigates a control flow prediction scheme that predicts the outcome of multiple branches by performing a single prediction. Instead of predicting the outcome of each individual conditional branch, this scheme considers a tree-like subgraph of the control flow graph of the executed program as a single prediction unit, and predicts the target of a subgraph at a time, thereby allowing the superscalar fetch mechanism to go past multiple branches per cycle. This approach is evaluated using the MIPS architecture, for a 12-way superscalar processor, and an improvement in effective fetch size of more than 50%, over an identical processor that uses branch prediction is observed for the SPEC integer benchmarks. No appreciable difference in the prediction accuracy was observed although the control flow prediction scheme predicted one out of four outcomes.

[1]  Lawrence Rauchwerger,et al.  Measuring limits of parallelism and characterizing its vulnerability to resource constraints , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[2]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[3]  Jean-Loup Baer,et al.  Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[4]  Gurindar S. Sohi,et al.  Control flow prediction for dynamic ILP processors , 1993, MICRO 1993.

[5]  Scott A. Mahlke,et al.  Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.

[6]  Gurindar S. Sohi,et al.  High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.

[7]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[8]  Anne Rogers,et al.  Software support for speculative loads , 1992, ASPLOS V.

[9]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[10]  Walid A. Najjar,et al.  An evaluation of bottom-up and top-down thread generation techniques , 1993, MICRO 1993.

[11]  Yale N. Patt,et al.  Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.

[12]  Scott A. Mahlke,et al.  Characterizing the impact of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[14]  Manoj Franklin,et al.  Block-level prediction for wide-issue superscalar processors , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[15]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.