An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors

The performance of superscalar processors depends on many parameters with correlated effects. This paper explores the relations between some of these parameters, and more particularly, the requirement in instruction fetch bandwidth. We introduce new enhancements to increase the bandwidth of conventional instruction fetch engines. However, experiments show that the performance does not increase proportionally to the fetch. Once the measured IPC is half the instruction fetch bandwidth, increasing the fetch bandwidth brings very little improvement. In order to better understand this behavior, we develop a model from the empirical observation that the available instruction parallelism grows as the square root of the instruction window size. From the model, we derive that the fetch bandwidth requirement grows as the square root of the distance between mispredicted branches. We also verify experimentally that, to double the IPC, one should both double the fetch bandwidth and decrease the number of mispredicted branches fourfold.

[1]  Edward M. Riseman,et al.  The Inhibition of Potential Parallelism by Conditional Jumps , 1972, IEEE Transactions on Computers.

[2]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[3]  Dirk Grunwald,et al.  Dynamic hammock predication for non-predicated instruction set architectures , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[4]  Pierre Michaud,et al.  Trading Conflict And Capacity Aliasing In Conditional Branch Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[5]  James E. Smith,et al.  The performance potential of data dependence speculation and collapsing , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[6]  Stéphan Jourdan,et al.  A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[8]  Karel Driesen,et al.  The cascaded predictor: economical and adaptive branch target prediction , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[10]  Yale N. Patt,et al.  Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors , 1993, MICRO 1993.

[11]  Stéphan Jourdan,et al.  Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[12]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[13]  Pascal Sainrat,et al.  Multiple-block ahead branch predictors , 1996, ASPLOS VII.

[14]  Dirk Grunwald,et al.  Selective eager execution on the PolyPath architecture , 1998, ISCA.

[15]  Dirk Grunwald,et al.  Reducing branch costs via branch alignment , 1994, ASPLOS VI.

[16]  Scott A. Mahlke,et al.  Characterizing the impact of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[17]  Burzin A. Patel,et al.  Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[18]  Trevor N. Mudge,et al.  Instruction fetching: Coping with code bloat , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.