Isolating short-lived operands for energy reduction

A mechanism for reducing the power requirements in processors that use a separate (architectural) register file (ARF) for holding committed values is proposed. We exploit the notion of short-lived operands-values that target architectural registers that are renamed by the time the instruction producing the value reaches the writeback stage. Our simulations of the SPEC 2000 benchmarks show that as much as 71 percent to 97 percent of the results are short-lived. Our technique avoids unnecessary writebacks into the result repository (a slot within the reorder buffer or a physical register) as well as writes into the ARF from unnecessary commitments by caching (and isolating) short-lived operands within a small dedicated register file. Operands are cached in this manner till they can be safely discarded without jeopardizing the recovery from possible branch mispredictions or reconstruction of the precise state in case of interrupts or exceptions. Additional energy savings are achieved by limiting the number of ports used for instruction commitment. The power/energy savings are validated using SPICE measurements of actual layouts in a 0.18 micron CMOS process. The energy reduction in the ROB and the ARF is about 20 percent (translating into the overall chip energy reduction of about 5 percent) and this is achieved with no increase in cycle time, little additional complexity, and no degradation in the number of instructions committed per cycle.

[1]  Nader Bagherzadeh,et al.  A scalable register file architecture for dynamically scheduled processors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[2]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[3]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, MICRO.

[4]  Gürhan Küçük,et al.  Reducing datapath energy through the isolation of short-lived operands , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[5]  S. Peter Song,et al.  The PowerPC 604 RISC microprocessor. , 1994, IEEE Micro.

[6]  Joel S. Emer,et al.  Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[7]  PA-8000 Combines Complexity and Speed: 11/14/94 , 1994 .

[8]  Andreas Moshovos Power-Aware Register Renaming , 2022 .

[9]  Gürhan Küçük,et al.  Reducing reorder buffer complexity through selective operand caching , 2003, ISLPED '03.

[10]  Krste Asanovic,et al.  Banked multiported register files for high-frequency superscalar microprocessors , 2003, ISCA '03.

[11]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[12]  Stephen H. Gunther,et al.  Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .

[13]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[14]  Antonio Gonzalez,et al.  Lazy Retirement: A Power Aware Register Management Mechanism , 2002 .

[15]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[16]  Gürhan Küçük,et al.  Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, MICRO.

[17]  Michael C. Huang,et al.  Cherry: checkpointed early resource recycling in out-of-order microprocessors , 2002, MICRO.

[18]  Augustus K. Uht,et al.  Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[19]  Stamatis Vassiliadis,et al.  Register renaming and dynamic speculation: an alternative approach , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[20]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[21]  Margaret Martonosi,et al.  Reducing Register File Power Consumption by Exploiting Value Lifetime Characteristics , 2000 .

[22]  James E. Smith,et al.  Early-Stage Definition of LPX: A Low Power Issue-Execute Processor , 2002, PACS.

[23]  Gurindar S. Sohi,et al.  Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO.

[24]  Guang R. Gao,et al.  An investigation of the performance of various instruction-issue buffer topologies , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[25]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[26]  T. N. Vijaykumar,et al.  Reducing register ports for higher speed and lower energy , 2002, MICRO.

[27]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[28]  Alvin M. Despain,et al.  The 16-fold way: a microparallel taxonomy , 1993, MICRO 1993.

[29]  Trevor N. Mudge,et al.  Reducing register ports using delayed write-back queues and operand pre-fetch , 2003, ICS '03.

[30]  Gürhan Küçük,et al.  Low-complexity reorder buffer architecture , 2002, ICS '02.

[31]  Gürhan Küçük,et al.  A circuit-level implementation of fast, energy-efficient CMOS comparators for high-performance microprocessors , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.