Integration of message passing and shared memory in the Stanford FLASH multiprocessor

The advantages of using message passing over shared memory for certain types of communication and synchronization have provided an incentive to integrate both models within a single architecture. A key goal of the FLASH (FLexible Architecture for SHared memory) project at Stanford is to achieve this integration while maintaining a simple and efficient design. This paper presents the hardware and software mechanisms in FLASH to support various message passing protocols. We achieve low overhead message passing by delegating protocol functionality to the programmable node controllers in FLASH and by providing direct user-level access to this messaging subsystem. In contrast to most earlier work, we provide an integrated solution that handles the interaction of the messaging protocols with virtual memory, protected multiprogramming, and cache coherence. Detailed simulation studies indicate that this system can sustain message-transfers rates of several hundred megabytes per second, effectively utilizing projected network bandwidths for next generation multiprocessors.

[1]  P. Pierce,et al.  The NX/2 operating system , 1988, C3P.

[2]  Donald Yeung,et al.  THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[3]  D. Culler,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[4]  Michael D. Smith,et al.  Support for Speculative Execution in High-Performance Processors , 1992 .

[5]  Arvind,et al.  T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[6]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[7]  William J. Dally,et al.  The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[8]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[9]  Anant Agarwal,et al.  Integrating message-passing and shared-memory: early experience , 1993, SIGP.

[10]  Anant Agarwal,et al.  Anatomy of a message in the Alewife multiprocessor , 1993 .

[11]  Anant Agarwal,et al.  Anatomy of a Message in the Alewife Multiprocessor , 1993, The 8th IEEE Workshop on Computer Communications.

[12]  Stephen R. Goldschmidt,et al.  Simulation of multiprocessors: accuracy and performance , 1993 .

[13]  Jon Beecroft,et al.  Meiko CS-2 Interconnect Elan-Elite Design , 1994, Parallel Comput..

[14]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[15]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[16]  Anoop Gupta,et al.  Integrating multiple communication paradigms in high performance multiprocessors , 1994 .

[17]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[18]  Anoop Gupta,et al.  The performance impact of flexibility in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.