Distributed reorder buffer schemes for low power

We consider two approaches for reducing the complexity and power dissipation in processors that use separate register file to maintain committed register values. The first approach relies on a distributed implementation of the reorder buffer (ROB) that spreads the centralized ROB structure across the function units (FUs), with each distributed component sized to match the FU workload and with one write port and two read ports on each component. The second approach combines the use of the previously proposed retention latches and a distributed ROB implementation that uses minimally-ported distributed components. Such a combination avoids any read and write port conflicts on the distributed ROB components (with the exception of possible port conflicts in the course of commitment) and does not incur the associated performance degradation. Our designs are evaluated using the simulation of the SPEC 2000 benchmarks and SPICE simulations of the actual ROB layouts in 0.18 micron process. The ROB power savings of up to 49% can be realized with only 1.7% performance loss on the average.

[1]  Trevor N. Mudge,et al.  Reducing register ports using delayed write-back queues and operand pre-fetch , 2003, ICS '03.

[2]  Nader Bagherzadeh,et al.  A scalable register file architecture for dynamically scheduled processors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[3]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, MICRO.

[4]  Antonio Gonzalez,et al.  Lazy Retirement: A Power Aware Register Management Mechanism , 2002 .

[5]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[6]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[7]  Margaret Martonosi,et al.  Reducing Register File Power Consumption by Exploiting Value Lifetime Characteristics , 2000 .

[8]  Gurindar S. Sohi,et al.  Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '98.

[9]  Josep Llosa,et al.  Non-consistent dual register files to reduce register pressure , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[10]  Ramon Canal,et al.  Dynamic cluster assignment mechanisms , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[11]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[13]  Gürhan Küçük,et al.  Reducing reorder buffer complexity through selective operand caching , 2003, ISLPED '03.

[14]  Krste Asanovic,et al.  Banked multiported register files for high-frequency superscalar microprocessors , 2003, ISCA '03.

[15]  Gürhan Küçük,et al.  Reducing datapath energy through the isolation of short-lived operands , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[16]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[17]  Thomas Thomas,et al.  The PowerPC 620 microprocessor: a high performance superscalar RISC microprocessor , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[18]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[19]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[20]  T. N. Vijaykumar,et al.  Reducing register ports for higher speed and lower energy , 2002, MICRO.

[21]  Gürhan Küçük,et al.  Low-complexity reorder buffer architecture , 2002, ICS '02.

[22]  Joel S. Emer,et al.  Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[23]  PA-8000 Combines Complexity and Speed: 11/14/94 , 1994 .

[24]  S SohiGurindar Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .

[25]  Stephen H. Gunther,et al.  Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .

[26]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.