RingScalar: A Complexity-Effective Out-of-Order Superscalar Microarchitecture

RingScalar is a complexity-effective microarchitecture for out-of-order superscalar processors, that reduces the area, latency, and power of all major structures in the instruction flow. The design divides an -way superscalar into columns connected in a unidirectional ring, where each column contains a portion of the instruction window, a bank of the register file, and an ALU. The design exploits the fact that most decoded instructions are waiting on just one operand to use only a single tag per issue window entry, and to restrict instruction wakeup and value bypass to only communicate with the neighboring column. Detailed simulations of fourissue single-threaded machines running SPECint2000 show that RingScalar has IPC only 13% lower than an idealized superscalar, while providing large reductions in area, power, and circuit latency.

[1]  Nader Bagherzadeh,et al.  A scalable register file architecture for dynamically scheduled processors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[2]  T. N. Vijaykumar,et al.  Reducing register ports for higher speed and lower energy , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[3]  Mikko H. Lipasti,et al.  Half-price architecture , 2003, ISCA '03.

[4]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[5]  Krste Asanovic,et al.  A speculative control scheme for an energy-efficient banked register file , 2005, IEEE Transactions on Computers.

[6]  T. Austin,et al.  Cyclone: a broadcast-free dynamic instruction scheduler with selective replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[7]  Krste Asanovic,et al.  Banked multiported register files for high-frequency superscalar microprocessors , 2003, ISCA '03.

[8]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[9]  Jaume Abella,et al.  Inherently workload-balanced clustered microarchitecture , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[10]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[11]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[12]  Manoj Franklin,et al.  PEWs: a decentralized dynamic scheduler for ILP processing , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[13]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  Todd M. Austin,et al.  Efficient dynamic scheduling through tag elimination , 2002, ISCA.

[15]  Steven K. Reinhardt,et al.  A scalable instruction queue design using dependence chains , 2002, ISCA.

[16]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[18]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[19]  Manoj Franklin,et al.  An empirical study of decentralized ILP execution models , 1998, ASPLOS VIII.

[20]  Pradip Bose,et al.  Tradeoffs in power-efficient issue queue design , 2002, ISLPED '02.

[21]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[22]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).