Control Flow Prediction Schemes for Wide-Issue Superscalar Processors

In order to achieve high performance, wide-issue superscalar processors have to fetch a large number of instructions per cycle. Conditional branches are the primary impediment to increasing the fetch bandwidth because they can potentially alter the flow of control and are very frequent. To overcome this problem, these processors need to predict the outcome of multiple branches in a cycle. This paper investigates two control flow prediction schemes that predict the effective outcome of multiple branches with the help of a single prediction. Instead of considering branches as the basic units of prediction, these schemes consider subgraphs of the control flow graph of the executed program as the basic units of prediction and predict the target of an entire subgraph at a time, thereby allowing the superscalar fetch mechanism to go past multiple branches in a cycle. The first control flow prediction scheme investigated considers sequential block-like subgraphs and the second scheme considers tree-like subgraphs to make the control flow predictions. Both schemes do a 1-out-of-4 prediction as opposed to the 1-out-of-2 prediction done by branch-level prediction schemes. These two schemes are evaluated using a MIPS ISA-based 12-way superscalar microarchitecture. An improvement in effective fetch size of approximately 25 percent and 50 percent, respectively, is observed over identical microprocessors that use branch-level prediction. No appreciable difference in the prediction accuracy was observed, although the control flow prediction schemes predicted 1-out-of-4, outcomes.

[1]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[2]  Pascal Sainrat,et al.  Multiple-block ahead branch predictors , 1996, ASPLOS VII.

[3]  Manoj Franklin,et al.  Control flow prediction with unbalanced tree-like subgraphs , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).

[4]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[5]  Yale N. Patt,et al.  Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.

[6]  Scott A. Mahlke,et al.  Characterizing the impact of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[8]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[9]  Bunith Cyril,et al.  A study of tree-based control flow prediction schemes , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[10]  David I. August,et al.  Compiler technology for future microprocessors , 1995, Proc. IEEE.

[11]  Anne Rogers,et al.  The performance impact of incomplete bypassing in processor pipelines , 1995, MICRO 1995.

[12]  Manoj Franklin,et al.  The multiscalar architecture , 1993 .

[13]  Lawrence Rauchwerger,et al.  Measuring limits of parallelism and characterizing its vulnerability to resource constraints , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[14]  Gurindar S. Sohi,et al.  Control flow prediction for dynamic ILP processors , 1993, MICRO 1993.

[15]  Martin Charles Golumbic,et al.  Instruction Scheduling Beyond Basic Blocks , 1990, IBM J. Res. Dev..

[16]  Ravi Nair Dynamic path-based branch correlation , 1995, MICRO 1995.

[17]  James E. Smith,et al.  The microarchitecture of superscalar processors , 1995, Proc. IEEE.