In the design of FLASH, the successor to the Stanford DASH multiprocessor, we are exploring architectural mechanisms for efficiently supporting both the shared memory and message passing communication models in a single system. The unique feature in the FLASH (FLexible Architecture for SHared memory) system is the use of a programmable controller at each node that replaces the functionality of hardwired cache coherence state machines in systems like DASH. The base coherence protocol is supported by executing appropriate software handlers on the programmable controller to service memory and coherence operations. The same programmable controller is also used to support message passing. This approach is attractive because of the flexibility software provides for implementing different coherence and message passing protocols, and because of the simplification in system design and debugging that arises from the shift of complexity from hardware to software. This paper focuses on the use of the programmable controller to support message passing. Our goal is to provide message passing performance that is comparable to an aggressive hardware implementation dedicated to this task. In FLASH, message data is transferred as a sequence of cache line sized units, thus exploiting the datapath support already present for cache coherence. In addition, we avoid costly interrupts to the main processor by having the programmable engine handle the control for message transfers. Furthermore, in contrast to most earlier work, we provide an integrated solution that handles the interaction of message data with virtual memory, protected multiprogramming, and cache coherence. Our preliminary performance studies indicate that this system can sustain message transfers at a rate of several hundred megabytes per second, efficiently utilizing the available network bandwidth.
[1]
Seth Copen Goldstein,et al.
Active messages: a mechanism for integrating communication and computation
,
1998,
ISCA '98.
[2]
Andrew A. Chien,et al.
The J-Machine: A Fine Grain Concurrent Computer
,
1989
.
[3]
Anoop Gupta,et al.
The Stanford Dash multiprocessor
,
1992,
Computer.
[4]
Michael D. Noakes,et al.
The J-machine multicomputer: an architectural evaluation
,
1993,
ISCA '93.
[5]
Anant Agarwal,et al.
Anatomy of a message in the Alewife multiprocessor
,
1993,
ICS '93.
[6]
Anoop Gupta,et al.
The Stanford FLASH multiprocessor
,
1994,
ISCA '94.
[7]
Anant Agarwal,et al.
Integrating message-passing and shared-memory: early experience
,
1993,
PPOPP '93.
[8]
Dana S. Henry,et al.
A tightly-coupled processor-network interface
,
1992,
ASPLOS V.
[9]
Donald Yeung,et al.
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
,
1991
.
[10]
P. Pierce,et al.
The NX/2 operating system
,
1988,
C3P.
[11]
Arvind,et al.
T: a multithreaded massively parallel architecture
,
1992,
ISCA '92.