Register relocation: flexible contexts for multithreading

Multithreading is an important technique that improves processor utilization by allowing computation to be overlapped with the long latency operations that commonly occur in multiprocessor systems. This paper presents register relocation, a new mechanism that efficiently supports flexible partitioning of the register file into variable-size contexts with minimal hardware support. Since the number of registers required by thread contexts varies, this flexibility permits a better utilization of scarce registers, allowing more contexts to be resident, which in turn allows applications to tolerate shorter run lengths and longer latencies. Our experiments show that compared to fixed-size hardware contexts, register relocation can improve processor utilization by a factor of two for many workloads.

[1]  V. Soundararajan DRIBBLE-BACK REGISTERS: A TECHNIQUE FOR LATENCY TOLERANCE IN MULTIPROCESSORS , 1992 .

[2]  Anant Agarwal,et al.  Waiting algorithms for synchronization in large-scale multiprocessors , 1993, TOCS.

[3]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[4]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[5]  Rishiyur S. Nikhil,et al.  Can Dataflow Subsume Von Neumann Computing? , 1989, The 16th Annual International Symposium on Computer Architecture.

[6]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[7]  David A. Patterson,et al.  Reduced instruction set computers , 1985, CACM.

[8]  David W. Wall,et al.  Global register allocation at link time , 1986, SIGPLAN '86.

[9]  Allan Porterfield,et al.  The Tera computer system , 1990, ICS '90.

[10]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[11]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[12]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[13]  Susan J. Eggers,et al.  The effect on RISC performance of register set size and structure versus code generation strategy , 1991, ISCA '91.

[14]  William J. Dally,et al.  A mechanism for efficient context switching , 1991, [1991 Proceedings] IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[15]  R. S. Nikhil Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[16]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[17]  JOHN L. HENNESSY,et al.  VLSI Processor Architecture , 1984, IEEE Transactions on Computers.

[18]  Anoop Gupta,et al.  Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results , 1989, The 16th Annual International Symposium on Computer Architecture.

[19]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[20]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[21]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[22]  Robert A. Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[23]  Anant Agarwal,et al.  Performance Tradeoffs in Multithreaded Processors , 1992, IEEE Trans. Parallel Distributed Syst..

[24]  Burton J. Smith,et al.  A processor architecture for Horizon , 1988, Proceedings. SUPERCOMPUTING '88.

[25]  Janak H. Patel,et al.  Performance evaluation of multiple register sets , 1987, ISCA '87.

[26]  Peter J. Denning,et al.  Working Sets Past and Present , 1980, IEEE Transactions on Software Engineering.