Early Experience with Message-Passing on the SHRIMP Multicomputer

The SHRIMP multicomputer provides virtual memory-mapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that a data transfer can be done without the intervention of the receiving node CPU. An important question is whether such a mechanism can indeed deliver all of the available hardware performance to applications which use conventional message-passing libraries.This paper reports our early experience with message-passing on a small, working SHRIMP multicomputer. We have implemented several user-level communication libraries on top of the VMMC mechanism, including the NX message-passing interface, Sun RPC, stream sockets, and specialized RPC. The first three are fully compatible with existing systems. Our experience shows that the VMMC mechanism supports these message-passing interfaces well. When zero-copy protocols are allowed by the semantics of the interface, VMMC can effectively deliver to applications almost all of the raw hardware's communication performance.

[1]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[2]  Andrew Birrell,et al.  Implementing Remote procedure calls , 1983, SOSP '83.

[3]  A. Saini An overview of the Intel Pentium processor , 1993, Digest of Papers. Compcon Spring.

[4]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[5]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.

[6]  CORPORATE Ncube The NCUBE family of high-performance parallel computer systems , 1988, C3P.

[7]  L. Wittie,et al.  Extended Version of Parallel Processing Conf. 1992 Paper and Revised Version of Stony Brook Tr # 92/01 Eager Sharing for Eecient Massive Parallelism , 1992 .

[8]  Angelos Bilas,et al.  Fast RPC on the SHRIMP Virtual Memory Mapped Network Interface , 1997, J. Parallel Distributed Comput..

[9]  William J. Dally,et al.  The J-machine system , 1991 .

[10]  Kai Li,et al.  Design and implementation of NX message passing using Shrimp virtual memory mapped communication , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[11]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[12]  Cezary Dubnicki,et al.  Stream Sockets on SHRIMP , 1997, CANPC.

[13]  Richard P. LaRowe,et al.  Hardware assist for distributed shared memory , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[14]  G.S. Delp,et al.  Memory as a network abstraction , 1991, IEEE Network.

[15]  John Wilkes Hamlyn — an interface for sender- based communications , 1992 .

[16]  Mosur Ravishankar,et al.  PLUS: a distributed shared-memory system , 1990, ISCA '90.

[17]  Kai Li,et al.  Protected, user-level DMA for the SHRIMP network interface , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[18]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[19]  H. T. Kung,et al.  Supporting systolic and memory communication in iWarp , 1990, ISCA '90.

[20]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[21]  Jeffrey S. Chase,et al.  The Amber system: parallel programming on a network of multiprocessors , 1989, SOSP '89.

[22]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[23]  Brian N. Bershad,et al.  User-level interprocess communication for shared memory multiprocessors , 1991, TOCS.

[24]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[25]  Henri E. Bal,et al.  A Distributed Implementation of the Shared Data-object Model , 1989 .

[26]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[27]  Armando P. Stettner The design and implementation of the 4.3BSD UNIX operating system , 1988 .

[28]  Creve Maples,et al.  A high-performance, memory-based interconnection system for multicomputer environments , 1990, Proceedings SUPERCOMPUTING '90.

[29]  Sun Microsystems,et al.  RPC: Remote Procedure Call Protocol specification: Version 2 , 1988, RFC.

[30]  Larry D. Wittie,et al.  Eager Sharing fo Efficient Massive Parallelism , 1992, ICPP.

[31]  Brian N. Bershad,et al.  Lightweight remote procedure call , 1990 .

[32]  Henry M. Levy,et al.  Efficient Support for Multicomputing on ATM Networks , 1993 .

[33]  Prithviraj Banerjee,et al.  A message passing coprocessor for distributed memory multicomputers , 1990, Proceedings SUPERCOMPUTING '90.

[34]  Samuel J. Leffler,et al.  The design and implementation of the , 1990 .

[35]  P. Pierce,et al.  The NX/2 operating system , 1988, C3P.

[36]  Arvind,et al.  T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[37]  Alfred Z. Spector,et al.  Performing remote operations efficiently on a local computer network , 1981, SOSP.

[38]  Michael Burrows,et al.  Performance of Firefly RPC , 1990, ACM Trans. Comput. Syst..

[39]  Anoop Gupta,et al.  Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[40]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[41]  Jon Beecroft,et al.  Meiko CS-2 Interconnect Elan-Elite Design , 1994, Parallel Comput..