A software architecture for global address space communication on clusters: put/get on fast messages

Global address space parallel programming models can be an effective alternative to send/receive style communication, simplifying programming or code generation and increasing performance for certain application types. Traditionally, global address space mechanisms have been implemented in hardware in order to provide the necessary communication performance and responsiveness. However new high performance cluster messaging systems now allow global address space mechanisms to be realized efficiently in software. We describe a high performance one sided communication model that is implemented as a software layer on top of the Illinois Fast Messages (FM) system. We evaluate several different software implementation architectures for the remote agent, characterizing their differing performance characteristics. Our Put/Get FM implementation achieves peak bandwidths for put/get operations of 67 MBytes/s, overheads of a few microseconds, and remote read latencies as low as 26 microseconds on a Myrinet connected PC cluster. This implementation was released publicly as part of HPVM 1.0 in August 1997, and is receiving significant usage. It has been used for an implementation of the Global Arrays library and also serves as a back-end target for PGI's commercial HPF compiler.

[1]  Cezary Dubnicki,et al.  VMMC-2 : Efficient Support for Reliable, Connection-Oriented Communication , 1997 .

[2]  Scott Pakin,et al.  High Performance Virtual Machines (HPVM'S): Clusters with Supercomputing API's and Performance , 1997, PPSC.

[3]  Robert J. Harrison,et al.  Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.

[4]  Chris J. Scheiman,et al.  Evaluation of architectural support for global address-based communication in large-scale parallel machines , 1996, ASPLOS VII.

[5]  Steven L. Scott,et al.  Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.

[6]  Robert J. Harrison,et al.  Performance and experience with LAPI-a new high-performance communication library for the IBM RS/6000 SP , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[7]  Scott Pakin,et al.  Fast messages: efficient, portable communication for workstation clusters and MPPs , 1997, IEEE Concurrency.

[8]  Remzi H. Arpaci-Dusseau,et al.  Empirical evaluation of the CRAY-T3D: a compiler perspective , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[9]  Philip Heidelberger,et al.  Message proxies for efficient, protected communication on SMP clusters , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[10]  D. B. Davis,et al.  Intel Corp. , 1993 .

[11]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[12]  Hiroshi Tezuka PM : A High-Performance Communication Library for Multi-user Parallel Environments , 1996 .

[13]  Yoichi Koyanagi,et al.  AP1000+: architectural support of PUT/GET interface for parallelizing compiler , 1994, ASPLOS VI.

[14]  Robert J. Harrison,et al.  Shared memory NUMA programming on I-WAY , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[15]  Robert Horst Tnet: a reliable system area network for I/O and IPC , 1994, Symposium Record Hot Interconnects II.

[16]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.