Virtual Registers: Reducing Register Pressure Without Enlarging the Register File

This paper proposes a novel scheme to mitigate the register pressure for statically scheduled high-performance embedded processors without physically enlarging the register file. Our scheme exploits the fact that a large fraction of variables are short-lived, which do not need to be written to or read from real registers. Instead, the compiler can allocate these short-lived variables to the virtual registers, which are simply place holders (instead of physical storage locations in the register file) to identify dependences among instructions. Our experimental results demonstrate that virtual registers are very effective at reducing the number of register spills; which, in many cases, can achieve the performance close to the processor with twice number of real registers. Also, our results indicate that for some multimedia and communication applications, using a large number of virtual registers with a small number of real registers can even achieve higher performance than that of a mid-sized register file without any virtual registers.

[1]  Andrew R. Pleszkun,et al.  Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[2]  Microsystems Sun,et al.  Jini^ Architecture Specification Version 2.0 , 2003 .

[3]  Krste Asanovic,et al.  Banked multiported register files for high-frequency superscalar microprocessors , 2003, ISCA '03.

[4]  Victor V. Zyuban,et al.  Inherently Lower-Power High-Performance Superscalar Architectures , 2001, IEEE Trans. Computers.

[5]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[6]  Trevor N. Mudge,et al.  Integrating superscalar processor components to implement register caching , 2001, ICS '01.

[7]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[8]  Nikil D. Dutt,et al.  Efficient utilization of scratch-pad memory in embedded processor applications , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[9]  Scott A. Mahlke,et al.  Systematic register bypass customization for application-specific processors , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[10]  Hansoo Kim,et al.  Region-based Register Allocation for EPIC Architectures , 2000 .

[11]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[12]  Aviral Shrivastava,et al.  Bypass aware instruction scheduling for register file power reduction , 2006 .

[13]  Vittorio Zaccaria,et al.  Exploiting data forwarding to reduce the power budget of VLIW embedded processors , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[14]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[16]  David A. Patterson,et al.  Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .

[17]  Guang R. Gao,et al.  Exploiting short-lived variables in superscalar processors , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[18]  Trevor N. Mudge,et al.  How to fake 1000 registers , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[19]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[20]  Anne Rogers,et al.  The performance impact of incomplete bypassing in processor pipelines , 1995, MICRO 1995.

[21]  Neil C. Wilhelm,et al.  Caching processor general registers , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[22]  Mahmut T. Kandemir,et al.  Dynamic management of scratch-pad memory space , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[23]  Santosh Pande,et al.  Differential register allocation , 2005, PLDI '05.

[24]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[25]  Sumesh Udayakumaran,et al.  Compiler-decided dynamic memory allocation for scratch-pad based embedded systems , 2003, CASES '03.

[26]  John L. Hennessy,et al.  Register allocation by priority-based coloring , 1984, SIGPLAN '84.

[27]  Norman P. Jouppi,et al.  Register file design considerations in dynamically scheduled processors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[28]  Mateo Valero,et al.  Virtual-physical registers , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.