A new DMA registration strategy for pinning-based high performance networks

This paper proposes anew memory registration strategy for supporting Remote DMA (RDMA) operations over pinning-based networks, as existing approaches are insufficient for efficiently implementing Global Address Space (GAS) languages. Although existing approaches often maximize bandwidth, they require levels of synchronization that discourage one-sided communication, and can have significant latency costs for small messages. The proposed Firehose algorithm attempts to expose one-sided, zero-copy communication as a common case, while minimizing the number of host-level synchronizations required to support remote memory operations. The basic idea is to reap the performance benefits of a pin-everything approach in the common case (without the drawbacks) and revert to a rendezvous-based approach to handle the uncommon case. In all cases, the algorithm attempts to amortize the cost of synchronization and pinning over multiple remote memory operations, improving performance over rendezvous by avoiding many handshaking messages and the cost of re-pinning recently used pages. Performance results are presented which demonstrate that the cost of two-sided handshaking and memory registration is negligible when the set of remotely referenced memory pages on a given node is smaller than the physical memory (where the entire working set can remain pinned), and for applications with larger working sets the performance degrades gracefully and consistently outperforms conventional approaches.

[1]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[2]  David E. Culler,et al.  Active message applications programming interface and communication subsystem organization , 1995 .

[3]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[4]  Bernard Tourancheau,et al.  BIP messages user manual , 1997 .

[5]  Jason Duell,et al.  An evaluation of current high-performance networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[6]  Thorsten von Eicken,et al.  Incorporating Memory Management into User-Level Network Interfaces , 1997 .

[7]  Liviu Iftode,et al.  Design choices in the SHRIMP system: an empirical study , 1998, ISCA.

[8]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[9]  Bryan Carpenter,et al.  ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.

[10]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[11]  Larry L. Peterson,et al.  Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.

[12]  Hiroshi Tezuka,et al.  Pin-down cache: a virtual memory management technique for zero-copy communication , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[13]  Katherine Yelick,et al.  Titanium Language Reference Manual , 2001 .

[14]  Dhabaleswar K. Panda,et al.  Protocols and strategies for optimizing performance of remote memory operations on clusters , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[15]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[16]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[17]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[18]  A. Chien,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[19]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[20]  Dan Bonachea GASNet Specification, v1.1 , 2002 .

[21]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .