论文信息 - SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor

SMARTMAP: Operating system support for efficient data sharing among processes on a multi-core processor

This paper describes SMARTMAP, an operating system technique that implements fixed offset virtual memory addressing. SMARTMAP allows the application processes on a multi-core processor to directly access each other's memory without the overhead of kernel involvement. When used to implement MPI, SMARTMAP eliminates all extraneous memory-to-memory copies imposed by UNIX-based shared memory strategies. In addition, SMARTMAP can easily support operations that UNIX-based shared memory cannot, such as direct, in-place MPI reduction operations and one-sided get/put operations. We have implemented SMARTMAP in the Catamount lightweight kernel for the Cray XT and modified MPI and Cray SHMEM libraries to use it. Micro-benchmark performance results show that SMARTMAP allows for significant improvements in latency, bandwidth, and small message rate on a quad-core processor.

[1] Suzanne M. Kelly,et al. Software Architecture of the Light Weight Kernel, Catamount , 2005 .

[2] Ron Brightwell. A Prototype Implementation of MPI for SMARTMAP , 2008, PVM/MPI.

[3] Karl Feind,et al. An Ultrahigh Performance MPI Implementation on SGI® ccNUMA Altix® Systems , 2006 .

[4] Guillaume Mercier,et al. Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem , 2007, Parallel Comput..

[5] Keith D. Underwood,et al. SeaStar Interconnect: Balanced Bandwidth for Scalable Performance , 2006, IEEE Micro.

[6] Suchuan Dong,et al. Dual-Level Parallelism for Deterministic and Stochastic CFD Problems , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[7] Franck Cappello,et al. MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[8] Keith D. Underwood,et al. Implementation and Performance of Portals 3.3 on the Cray XT3 , 2005, 2005 IEEE International Conference on Cluster Computing.

[9] Galen M. Shipman,et al. MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives , 2008, PVM/MPI.

[10] Larry L. Peterson,et al. Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.

[11] Hyun-Wook Jin,et al. Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems , 2008, 2008 37th International Conference on Parallel Processing.

[12] D. S. Henty,et al. Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[13] Guillaume Mercier,et al. Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[14] Guillaume Mercier,et al. Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem , 2006, PVM/MPI.

[15] Sayantan Sur,et al. LiMIC: support for high-performance MPI intra-node communication on Linux cluster , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[16] Guillaume Mercier,et al. Data Transfers between Processes in an SMP System: Performance Study and Application to MPI , 2006, 2006 International Conference on Parallel Processing (ICPP'06).