The overhead of copying data through the central processor by a message passing protocol limits data transfer bandwidth. If the network interface directly transfers the user's memory to the network by issuing DMA, such data copies may be eliminated. Since the DMA facility accesses the physical memory address space, user virtual memory must be pinned down to a physical memory location before the message is sent or received. If each message transfer involves pin-down and release kernel primitives, message transfer bandwidth will decrease since those primitives are quite expensive. The authors propose a zero copy message transfer with a pin-down cache technique which reuses the pinned-down area to decrease the number of calls to pin-down and release primitives. The proposed facility has been implemented in the PM low-level communication library on the RWC PC Cluster II, consisting of 64 Pentium Pro 200 MHz CPUs connected by a Myricom Myrinet network, and running NetBSD. The PM achieves 108.8 MBytes/sec for a 100% pin-down cache hit ratio and 78.7 MBytes/sec for all pin-down cache miss. The MPI library has been implemented on top of PM. According to the NAS parallel benchmarks result, an application is still better performance in case that cache miss ratio is very high.
[1]
Yutaka Ishikawa,et al.
Implementation of Gang-Scheduling on Workstation Cluster
,
1996,
JSSPP.
[2]
Atsushi Hori,et al.
User-level Parallel Operating System for Clustered Commodity Computers
,
1997
.
[3]
Matsuoka Satoshi,et al.
Implementing MPI in a High - Performance Multithreaded Language MPC++
,
1996
.
[4]
Yutaka Ishikawa,et al.
Global State Detection Using Network Preemption
,
1997,
JSSPP.
[5]
Mitsuhisa Sato,et al.
PM: An Operating System Coordinated High Performance Communication Library
,
1997,
HPCN Europe.
[6]
Cezary Dubnicki,et al.
VMMC-2 : Efficient Support for Reliable, Connection-Oriented Communication
,
1997
.