Design and Implementation of FMPL, a Fast Message-Passing Library for Remote Memory Operations

A fast message-passing library FMPL has been designed and developed to maximize communication performance by utilizing general architectural communication support such as remote memory operations, as well as to maximize total performance by eliminating dynamic communication overhead and overlapping communication and computation. FMPL provides a low-cost general-purpose point-to-point communication and collective communication such as broadcast, barrier synchronization and reduction. On a Hitachi SR8000, FMPL achieves an 8-byte latency of 12.8µsec., while MPI achieves 20µsec. FMPL is designed for building more highly functional message-passing libraries like BLACS as well as applications that need maximum performance.

[1]  Mitsuhisa Sato,et al.  The EM-X parallel computer: architecture and basic performance , 1995, ISCA.

[2]  Mario Lauria,et al.  MPI-FM: High Performance MPI on Workstation Clusters , 1997, J. Parallel Distributed Comput..

[3]  Keshav Pingali,et al.  I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[4]  Kenichi Hayashi,et al.  An MPI library which uses polling, interrupts and remote copying for the Fujitsu AP1000+ , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[5]  R. C. Whaley,et al.  LAPACK Working Note 94: A User''s Guide to the BLACS v1.0 , 1995 .

[6]  Hiroshi Harada,et al.  PM2: High Performance Communication Middleware for Heterogeneous Network Environments , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[7]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[8]  Hiroshi Harada,et al.  The design and evaluation of high performance communication using a Gigabit Ethernet , 1999, ICS '99.

[9]  Yuetsu Kodama,et al.  Highly efficient implementation of MPI point-to-point communication using remote memory operations , 1998, ICS '98.

[10]  Hiroshi Nakamura,et al.  CP-PACS: a massively parallel processor for large scale scientific calculations , 1997, ICS '97.

[11]  Takashi Matsumoto,et al.  MBCF: a protected and virtualized high-speed user-level memory-based communication facility , 1998, ICS '98.

[12]  Takashi Matsumoto,et al.  Implementing MPI with the Memory-Based Communication Facilities on the SSS-CORE Operating System , 1998, PVM/MPI.

[13]  Mitsuhisa Sato,et al.  PM: An Operating System Coordinated High Performance Communication Library , 1997, HPCN Europe.

[14]  Hiroshi Tezuka,et al.  The design and implementation of zero copy MPI using commodity hardware with a high performance network , 1998, ICS '98.

[15]  Hiroshi Nakamura,et al.  A scalar architecture for pseudo vector processing based on slide-windowed registers , 1993, ICS '93.

[16]  Scott Pakin,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[17]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[18]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..