Highly efficient implementation of MPI point-to-point communication using remote memory operations

MPl point-to-point communication is a basic operation, however it requires runtime-matching of send and receive that causes to reduce performance. This paper proposes a new approach to send messages by remote memory write without inquiring of the receiver under a communication pattern such that nonblocking receive is issued in advance. Basically, this approach makes it possible to gain low latency and high bandwidth as the hardware specification. MPI-EMX, our implementation of the MPI on the EM-X multiprocessor, achieves a zero-byte latency of 13.4 psec. and a maximum bandwidth of 31.4 MB/s, which can compete with commercial MPPs. This approach to reduce communication latency is widely applicable to other systems and is quite a promising technique for achieving low latency and high bandwidth.

[1]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[2]  Mitsuhisa Sato,et al.  Thread-based programming for the EM-4 hybrid dataflow machine , 1992, ISCA '92.

[3]  Mitsuhisa Sato,et al.  Parallel Language and Compiler Research in Japan , 1995, Springer US.

[4]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[5]  Mitsuhisa Sato,et al.  EMC-Y: parallel processing element optimizing communication and computation , 1993, ICS '93.

[6]  Mario Lauria,et al.  MPI-FM: High Performance MPI on Workstation Clusters , 1997, J. Parallel Distributed Comput..

[7]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[8]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[9]  Mitsuhisa Sato,et al.  The EM-X parallel computer: architecture and basic performance , 1995, ISCA.

[10]  Kenichi Hayashi,et al.  An MPI library which uses polling, interrupts and remote copying for the Fujitsu AP1000+ , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).