An Enhancement for a Scheduling Logic Pipelined over two Cycles

Out of order processors use the dynamic scheduling logic both to expose and to exploit parallelism. Pipelining this logic may sacrifice the ability to execute dependent instructions in consecutive cycles. Several previous studies have shown that pipelining the scheduling logic over two cycles degrades performance; our evaluations, in a 4-way machine, on SPEC-2000 integer benchmarks show a performance degradation about 11% compared to an unpipelined scheduling logic. In this work, we present two non-speculative enhancements for a scheduling logic pipelined over two cycles. The idea is computing in advance which instructions will be woken-up by all instructions that are currently competing for selection. Once all of them have been selected, the pre-computed group of instructions can compete for selection in next cycle. The enhancement goal is to tolerate the scheduling-loop latency when not enough ILP is available through the scheduling of dependent instructions in consecutive cycles. Our results in a 4-way machine show that our two proposed enhancements perform, on average, slightly better than two previously proposed speculative schedulers. The performance of our proposals is within a 2.6% and 2% of an unpipelined ideal scheduler.

[1]  S. Tomita,et al.  A high-speed dynamic instruction scheduling scheme for supersealar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[2]  Mikko H. Lipasti,et al.  An approach for implementing efficient superscalar CISC processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[3]  Peter G. Sassone,et al.  Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[4]  Mikko H. Lipasti,et al.  Half-price architecture , 2003, ISCA '03.

[5]  Norman P. Jouppi,et al.  Quantifying the Complexity of Superscalar Processors , 2002 .

[6]  Rajiv Gupta,et al.  Instruction Wake-Up in Wide Issue Superscalars , 2001, Euro-Par.

[7]  Yale N. Patt,et al.  On pipelining dynamic instruction scheduling logic , 2000, MICRO 33.

[8]  Todd M. Austin,et al.  Efficient dynamic scheduling through tag elimination , 2002, ISCA.

[9]  Mikko H. Lipasti,et al.  Macro-op scheduling: relaxing scheduling loop constraints , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[10]  Masahiro Goshima,et al.  A high-speed dynamic instruction scheduling scheme for superscalar processors , 2001, MICRO.

[11]  Kuo-Su Hsiao,et al.  An efficient wakeup design for energy reduction in high-performance superscalar processors , 2005, CF '05.

[12]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[13]  Fischer Issue Logic For A 600 MHz Out-of-order Execution , 1997, Symposium 1997 on VLSI Circuits.

[14]  James E. Smith,et al.  Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Trans. Computers.

[15]  Yu Bai,et al.  A dynamically reconfigurable mixed in-order/out-of-order issue queue for power-aware microprocessors , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[16]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[17]  Yale N. Patt,et al.  Select-free instruction scheduling logic , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[18]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[19]  T. Austin,et al.  Cyclone: a broadcast-free dynamic instruction scheduler with selective replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[20]  Mikko H. Lipasti,et al.  Macro-op Scheduling: Relaxing Scheduling Loop Constraints , 2003, MICRO.

[21]  BurgerDoug,et al.  The SimpleScalar tool set, version 2.0 , 1997 .

[22]  Amir Roth,et al.  Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[23]  Ramon Canal,et al.  A low-complexity issue logic , 2000, ICS '00.

[24]  Pierre Michaud,et al.  Data-flow prescheduling for large instruction windows in out-of-order processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.