Evaluation of architectural support for global address-based communication in large-scale parallel machines

Large-scale parallel machines are incorporating increasingly sophisticated architectural support for user-level messaging and global memory access. We provide a systematic evaluation of a broad spectrum of current design alternatives based on our implementations of a global address language on the Thinking Machines CM-5, Intel Paragon, Meiko CS-2, Cray T3D, and Berkeley NOW. This evaluation includes a range of compilation strategies that make varying use of the network processor; each is optimized for the target architecture and the particular strategy. We analyze a family of interacting issues that determine the performance trade-offs in each implementation, quantify the resulting latency, overhead, and bandwidth of the global access operations, and demonstrate the effects on application performance.

[1]  Anant Agarwal,et al.  Anatomy of a message in the Alewife multiprocessor , 1993, ICS '93.

[2]  Remzi H. Arpaci-Dusseau,et al.  Empirical evaluation of the CRAY-T3D: a compiler perspective , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[3]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[4]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[5]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[6]  Richard L. Sites,et al.  Alpha Architecture Reference Manual , 1995 .

[7]  Anne Rogers,et al.  Early Experiences with Olden , 1993, LCPC.

[8]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[9]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[10]  Rishiyur S. Nikhil,et al.  Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines , 1994, LCPC.

[11]  Chris J. Scheiman,et al.  Exploiting the capabilities of communications co-processors , 1996, Proceedings of International Conference on Parallel Processing.

[12]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[13]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[14]  Brian N. Bershad,et al.  Software write detection for a distributed shared memory , 1994, OSDI '94.

[15]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[16]  K. Mani Chandy,et al.  Compositional C++: Compositional Parallel Programming , 1992, LCPC.

[17]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[18]  James Cownie,et al.  Message Passing on the Meiko CS-2 , 1994, Parallel Comput..

[19]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[20]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[21]  Chris J. Scheiman,et al.  Experience with active messages on the Meiko CS-2 , 1995, Proceedings of 9th International Parallel Processing Symposium.

[22]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[23]  E AndersonThomas,et al.  Efficient software-based fault isolation , 1993 .

[24]  LiKai,et al.  Memory coherence in shared virtual memory systems , 1989 .

[25]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[26]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[27]  Richard P. Martin,et al.  LogP Performance Assessment of Fast Network Interfaces , 1995 .

[28]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[29]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[30]  T. Anderson,et al.  Eecient Software-based Fault Isolation , 1993 .

[31]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[32]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[33]  Laurie Hendren,et al.  Early experiences with olden (parallel programming) , 1993 .

[34]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.