Using an operand file to save energy and to decouple commit resources

The register file of a modern superscalar processor is a critical component of the processor pipeline that can have a large impact on processor performance. Large register files provide larger windows of speculation to the processor and allow greater levels of instruction-level parallelism. However, the access time and energy consumption of these structures can grow quite large when these structures increase in size, especially considering the number of ports required. The paper proposes an architecture that moves the large register file needed to fully exploit greater levels of instruction level parallelism off the schedule to the execute path of the processor. This is accomplished by decoupling the instruction window (the amount of instruction state maintained in the reorder buffer and register file) from the scheduling window (the working set of registers required by the instruction scheduler and execution core). The state of the scheduling window is maintained by an operand file and a speculative logical register file. The operand file stores only the set of input registers to be consumed by instructions in the issue queue, and provides low-latency and energy efficient storage for the working set of registers. This design can reduce the energy dissipation by a factor of 6.5 on average over a traditional large register file, and allows the instruction window to be scaled independently of the register file structures on the schedule to execute path.

[1]  Neil C. Wilhelm,et al.  Caching processor general registers , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[2]  E.S. Fetzer,et al.  A fully-bypassed 6-issue integer datapath and register file on an Itanium microprocessor , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[3]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[4]  Gurindar S. Sohi,et al.  Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO 1992.

[5]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[7]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[8]  Mateo Valero,et al.  Delaying physical register allocation through virtual-physical registers , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Stamatis Vassiliadis,et al.  Register renaming and dynamic speculation: an alternative approach , 1993, MICRO.

[11]  J.H. Tseng,et al.  Energy-efficient register access , 2000, Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843).

[12]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[13]  B. Calder,et al.  A scalable front-end architecture for fast instruction delivery , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[14]  Yale N. Patt,et al.  Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.

[15]  Marc Tremblay,et al.  A three dimensional register file for superscalar processors , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[16]  Nader Bagherzadeh,et al.  A scalable register file architecture for dynamically scheduled processors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[17]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[18]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[19]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[20]  Norman P. Jouppi,et al.  Register file design considerations in dynamically scheduled processors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[21]  Joel S. Emer,et al.  Loose loops sink chips , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[22]  Glenn Reinman,et al.  Predictive techniques for aggressive load speculation , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[23]  Trevor N. Mudge,et al.  Integrating superscalar processor components to implement register caching , 2001, ICS '01.

[24]  André Seznec,et al.  Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors , 2002, MICRO 35.

[25]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[26]  Gürhan Küçük,et al.  Low-complexity reorder buffer architecture , 2002, ICS '02.

[27]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[28]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, ISCA.

[29]  Glenn Reinman,et al.  Optimizations Enabled by a Decoupled Front-End Architecture , 2001, IEEE Trans. Computers.

[30]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).