论文信息 - Architectural and implementation tradeoffs in the design of multiple-context processors

Architectural and implementation tradeoffs in the design of multiple-context processors

Multiple-context processors have been proposed as an architectural technique to mitigate the effects of large memory latency in multiprocessors. We examine two schemes for implementing multiple-context processors. The first scheme switches between contexts only on a cache miss, while the other interleaves the contexts on a cycle-by-cycle basis. Both schemes provide the capability for a single context to fully utilize the pipeline. We show that cycle-by-cycle interleaving of contexts provides a performance advantage over switching contexts only at a cache miss. This advantage results from the context interleaving hiding pipeline dependencies and reducing the context switch cost. In addition, we show that while the implementation of the interleaved scheme is more complex, the complexity is not overwhelming. As pipelines get deeper and operate at lower percentages of peak performance, the performance advantage of the interleaved scheme is likely to justify its additional complexity. Kev Words and Phrases: multiple-context processors, multithreading, latency hiding, multiprocessors, pipelining

Anoop Gupta | Mark Horowitz | James Laudon

[1] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[2] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[3] Helen Davis,et al. Tango introduction and tutorial , 1990 .

[4] James H. Patterson,et al. Portable Programs for Parallel Processors , 1987 .

[5] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[6] Donald Yeung,et al. THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[7] Andrew R. Pleszkun,et al. Strategies for achieving improved processor throughput , 1991, ISCA '91.

[8] A. Gupta,et al. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[9] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[10] Michel Dubois,et al. Lockup-free Caches in High-Performance Multiprocessors , 1990, J. Parallel Distributed Comput..

[11] Kevin P. McAuliffe,et al. The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.