Active messages: a mechanism for integrating communication and computation

The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high communication costs. Research prototypes of message driven machines demonstrate low communication overhead, but poor processor cost/performance. We introduce a simple communication mechanism, Active Messages, show that it is intrinsic to both architectures, allows cost effective use of the hardware, and offers tremendous flexib ility. Implementations on nCUBE/2 and CM-5 are described and evaluated using a split-phase shared-memory extension to C, Split-C. We further show that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed. With this mechanism, latency tolerance becomes a programming/compiling concern. Hardware s upport for active messages is desirable and we outline a range of enhancements to mainstream processors.

[1]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[2]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[3]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[4]  Andrew A. Chien,et al.  Architecture of a message-driven processor , 1987, ISCA '87.

[5]  Gregory M. Papadopoulos,et al.  Implementation of a general purpose dataflow multiprocessor , 1991 .

[6]  David E. Culler,et al.  Resource requirements of dataflow programs , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[7]  Brian N. Bershad,et al.  Lightweight remote procedure call , 1989, TOCS.

[8]  Andrew A. Chien,et al.  Experience with CST: programming and implementation , 1989, PLDI '89.

[9]  Andrew A. Chien,et al.  The J-Machine: A Fine-Gain Concurrent Computer , 1989, IFIP Congress.

[10]  Douglas Johnson,et al.  Trap architectures for Lisp systems , 1990, LISP and Functional Programming.

[11]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[12]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[13]  J. Dongarra Performance of various computers using standard linear equations software , 1990, CARN.

[14]  Rishiyur S. Nikhil The Parallel Programming Language Id and its Compilation for Parallel Machines , 1993, Int. J. High Speed Comput..