Acceleration for MPI derived datatypes using an enhancer of memory and network

This paper presents a support function for MPI derived datatypes on an enhancer of memory and network named DIMMnet-3. It is a network interface with vector access functions and multi-banked extended memory, which is under development. Semi-hardwired derived datatype communication based on RDMA with hardwired scatter and gather is proposed. This mechanism and MPI using it are implemented and validated on DIMMnet-2 which is a former prototype operating on DDR DIMM slot. The performance of scatter and gather transfer of 8byte elements with large interval by using vector commands of DIMMnet-2 is 6.8 compared with software on a host. Proprietary benchmark of MPI derived datatype communication for transferring a submatrix corresponding to a narrow HALO area is executed. Observed bandwidth on DIMMnet-2 is far higher than that for similar condition with VAPI based MPI implementation on InfniBand, even though very old generation FPGA, poorer CPU and motherboard are used. This function will avoid cache pollution and save CPU time for processing with local data which can be overlapped with communication. A new commercial machine with vector scatter/gather functions in NIC named SGI Altix UV is launched recently. It may be able to adopt our proposed concept partially, even though the capacity and fine grain access throughput of main memory attached with CPU are not enhanced on it.

[1]  Hubert Ritzdorf,et al.  Flattening on the Fly: Efficient Handling of MPI Derived Datatypes , 1999, PVM/MPI.

[2]  Dhabaleswar K. Panda,et al.  Zero-Copy MPI Derived Datatype Communication over InfiniBand , 2004, PVM/MPI.

[3]  Noboru Tanabe,et al.  An enhancer of memory and network for applications with large-capacity data and non-continuous data accessing , 2009, The Journal of Supercomputing.

[4]  Noboru Tanabe,et al.  An Enhancer of Memory and Network for Cluster and its Applications , 2008, 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies.

[5]  Robert B. Ross,et al.  Implementing Fast and Reusable Datatype Processing , 2003, PVM/MPI.

[6]  Noboru Tanabe,et al.  MEMOnet: network interface plugged into a memory slot , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.

[7]  Y. Dohi,et al.  A New Memory Module for COTS-Based Personal Supercomputing , 2004, Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'04).

[8]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[9]  Dhabaleswar K. Panda,et al.  High performance implementation of MPI derived datatype communication over InfiniBand , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[10]  Surendra Byna,et al.  Improving the performance of MPI derived datatypes by optimizing memory-access cost , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.