Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, called block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. We have constructed a compiler that generates block-structured ISA code, and a simulator that models the execution of that code on a block-structured ISA processor. We show that for the SPECint95 benchmarks, the block-structured ISA improves the performance of an aggressive wide issue, dynamically scheduled processor by 15% while using simpler microarchitectural mechanisms to support wide issue and dynamic scheduling.

[1]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[2]  Manoj Franklin,et al.  Control flow prediction with tree-like subgraphs for superscalar processors , 1995, MICRO 1995.

[3]  Sanjay J. Patel,et al.  Critical Issues Regarding the Trace Cache Fetch Mechanism , 1997 .

[4]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[5]  Yale N. Patt,et al.  Critical issues regarding HPS, a high performance microarchitecture , 1985, MICRO 18.

[6]  Burzin A. Patel,et al.  Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7]  Trevor N. Mudge,et al.  Instruction fetching: Coping with code bloat , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[8]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[9]  Joseph A. Fisher,et al.  2n-way jump microinstruction hardware and an effective instruction binding method , 1980, SIGM.

[10]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[11]  Soo-Mook Moon,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 25.

[12]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[13]  Yale N. Patt,et al.  Facilitating superscalar processing via a combined static/dynamic register renaming scheme , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Peter Y.-T. Hsu,et al.  Highly concurrent scalar processing , 1986, ISCA '86.

[15]  Dionisios N. Pnevmatikatos,et al.  Guarded execution and branch prediction in dynamic ILP processors , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[16]  Yale N. Patt,et al.  Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.

[17]  Scott A. Mahlke,et al.  Characterizing the impact of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Kemal Ebcioglu,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 1992.

[19]  Alexandru Nicolau,et al.  Efficient hardware for multiway jumps and pre-fetches , 1985, MICRO 18.

[20]  Yale Patt,et al.  Exploiting fine-grained parallelism through a combination of hardware and software techniques , 1991, ISCA '91.

[21]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[22]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[23]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[24]  Gurindar S. Sohi,et al.  The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[25]  Walid A. Najjar,et al.  Design of storage hierarchy in multithreaded architectures , 1995, MICRO 1995.

[26]  Yale N. Patt,et al.  A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[27]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[28]  Yale N. Patt,et al.  Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.

[29]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[30]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[31]  Pascal Sainrat,et al.  Multiple-block ahead branch predictors , 1996, ASPLOS VII.

[32]  Gurindar S. Sohi,et al.  Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO 1992.

[33]  Yale N. Patt,et al.  HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.

[34]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[35]  Gurindar S. Sohi,et al.  The expandable split window paradigm for exploiting fine-grain parallelsim , 1992, ISCA '92.