Architectural and implementation tradeoffs in the design of multiple-context processors (abstract)

We examine two multiple-context schemes in the context of scalable shared-memory multiprocessors. The blocked scheme switches between contexts at cache misses. The proposed interleaved scheme switches between available contexts on a cycle-by-cycle basis, while providing full pipeline interlocks for good single-context performance. We show the interleaved scheme to have a performance advantage over the blocked scheme due to its ability to hide pipeline dependencies and reduce the context switch cost. We also show that, while the implementation of the interleaved scheme is more complex, this complexity is not overwhelming.

[1]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[2]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[3]  Helen Davis,et al.  Tango introduction and tutorial , 1990 .

[4]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[5]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[6]  Donald Yeung,et al.  THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[7]  Michel Dubois,et al.  Lockup-free Caches in High-Performance Multiprocessors , 1990, J. Parallel Distributed Comput..

[8]  Lars Lundberg,et al.  A Lockup-Free Multiprocessor Cache Design , 1991, ICPP.

[9]  James H. Patterson,et al.  Portable Programs for Parallel Processors , 1987 .

[10]  Chuan-lin Wu,et al.  A Benchmark Evaluation of a Multi-threaded RISC Processor Architecture , 1991, ICPP.

[11]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[12]  Anoop Gupta,et al.  Comparative evaluation of latency reducing and tolerating techniques , 1991, ISCA '91.

[13]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[14]  Mark Horowitz,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[15]  Franklin H. Moss,et al.  An introduction to the architecture of the Stellar Graphics supercomputer , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[16]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[17]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[18]  Arvin Park,et al.  Dynamic Base Register Caching: A Technique for Reducing Address Bus Width , 1991, ISCA.

[19]  Anoop Gupta,et al.  The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[20]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[21]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[22]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[23]  Arvin Park,et al.  Address compression through base register caching , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[24]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[25]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[26]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[27]  Andrew R. Pleszkun,et al.  Strategies for achieving improved processor throughput , 1991, ISCA '91.

[28]  Michael D. Smith,et al.  Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[29]  Hwa C. Torng,et al.  The Concurrent Execution of Multiple Instruction Streams on Superscalar Processors , 1991, ICPP.

[30]  Arvin Park,et al.  Workload and implementation considerations for dynamic base register caching , 1991, MICRO 24.

[31]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.