Select-free instruction scheduling logic

Pipelining allows processors to exploit parallelism. Unfortunately, critical loops---pieces of logic that must evaluate in a single cycle to meet IPC (Instructions Per Cycle) goals---prevent deeper pipelining. In today's processors, one of these loops is the instruction scheduling (wakeup and select) logic [10]. This paper describes a technique that pipelines this loop by breaking it into two smaller loops: a critical, single-cycle loop for wakeup; and a non-critical, potentially multi-cycle, loop for select. For the 12 SPECint*2000 benchmarks, a machine with two-cycle select logic (i. e., three-cycle scheduling logic) using this technique has an average IPC 15% greater than a machine with three-cycle pipelined conventional scheduling logic, and an IPC within 3% of a machine of the same pipeline depth and one-cycle (ideal) scheduling logic. Since select accounts for more than half the scheduling latency [10], this technique could significantly increase clock frequency while having minimal impact on IPC.

[1]  James E. Smith,et al.  Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Trans. Computers.

[2]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[3]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[4]  Fischer Issue Logic For A 600 MHz Out-of-order Execution , 1997 .

[5]  Rajiv Gupta,et al.  Superscalar execution with dynamic data forwarding , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[6]  T. Fischer,et al.  Issue Logic For A 600 MHz Out-of-order Execution , 1997, Symposium 1997 on VLSI Circuits.

[7]  Yale N. Patt,et al.  On pipelining dynamic instruction scheduling logic , 2000, MICRO 33.

[8]  Bradley C. Kuszmaul,et al.  Circuits for wide-window superscalar processors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Ramon Canal,et al.  A low-complexity issue logic , 2000, ICS '00.

[10]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[11]  Enric Morancho,et al.  Recovery mechanism for latency misprediction , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[12]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[13]  Pierre Michaud,et al.  Data-flow prescheduling for large instruction windows in out-of-order processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.