PM/InfiniBand-FJ: a high performance communication facility using InfiniBand for large scale PC clusters

This work describes a design of high performance communication facility called the PM/InfiniBand-FJ using InfiniBand interconnect for large scale PC clusters. The PM/InfiniBand-FJ has developed to realize higher application performance than commercial supercomputers and comparable availability to them. Since the specification of InfiniBand interconnect is designed for communication among servers and I/Os, there are some issues to use InfiniBand for high performance computation on over 1000 node PC clusters. Therefore, the PM/InfiniBand-FJ solves the issues by expanding the original specification of InfiniBand. We have implemented the PM/InfiniBand-FJ on SCore cluster system software, and evaluated the communication and application performance. The performance results show that a 913.2 MB/s of bandwidth and 15.6 /spl mu/s round trip time have been achieved on Xeon 2.8GHz PC with ServerWorks GC LE chipset. The result of NAS parallel benchmark shows that the 128 node result of IS Class B on PM/InfiniBand-FJ is 1.52 times faster than that of PM/MyrinetXP using Fujitsu PR1MERGY RX200 PC cluster (Xeon 3.06GHz).

[1]  Kouichi Kumon,et al.  PM/Ethernet-kRMA: a high performance remote memory access facility using multiple gigabit ethernet cards , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[2]  Dhabaleswar K. Panda,et al.  High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[3]  Hiroshi Harada,et al.  The design and evaluation of high performance communication using a Gigabit Ethernet , 1999, ICS '99.

[4]  Hiroshi Harada,et al.  PM2: High Performance Communication Middleware for Heterogeneous Network Environments , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[5]  Cezary Dubnicki,et al.  VMMC-2 : Efficient Support for Reliable, Connection-Oriented Communication , 1997 .

[6]  Bernard Tourancheau,et al.  BIP: A New Protocol Designed for High Performance Networking on Myrinet , 1998, IPPS/SPDP Workshops.

[7]  Yutaka Ishikawa,et al.  Dynamic home node reallocation on software distributed shared memory , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[8]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[9]  Charles L. Seitz,et al.  The design of the Caltech Mosaic C multicomputer , 1993 .

[10]  Mitsuhisa Sato,et al.  PM: An Operating System Coordinated High Performance Communication Library , 1997, HPCN Europe.

[11]  Hiroshi Harada,et al.  High performance communication using a commodity network for cluster systems , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[12]  Yutaka Ishikawa,et al.  Highly Efficient Gang Scheduling Implementation , 1998, Proceedings of the IEEE/ACM SC98 Conference.