An efficient design for fast memory registration in RDMA

Remote Direct Memory Access (RDMA) improves network bandwidth and reduces latency by eliminating unnecessary copies from network interface card to application buffers, but the communication buffer management to reduce memory registration and deregistration cost is a significant challenge to be addressed. Previous studies use pin-down cache and batched deregistration, but only simple LRU is used as a replacement algorithm to manage cache space. In this paper, we evaluate the cost of memory registration in both user and kernel spaces. Based on our analysis, we reduce the overhead of communication buffer management in two aspects simultaneously: utilize a Memory Registration Region Cache (MRRC), and optimize the RDMA communication process of clients and servers with Fast RDMA Read and Write Process (FRRWP). MRRC manages memory in terms of memory region, and replaces old memory regions according to both their sizes and recency. FRRWP overlaps memory registrations between a client and a server, and allows applications to submit RDMA write operations without being blocked by message synchronization. We compare the performance of MRRC and FRRWP with traditional RDMA operations. The results show that our new design improves the total cost of memory registrations and overall communication latency by up to 70%.

[1]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[2]  Dhabaleswar K. Panda,et al.  PVFS over InfiniBand: design and performance evaluation , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[3]  Dan Bonachea,et al.  A new DMA registration strategy for pinning-based high performance networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  Jeff Hilland RDMA Protocol Verbs Specification , 2003 .

[5]  Asit Dan,et al.  An approximate analysis of the LRU and FIFO buffer replacement schemes , 1990, SIGMETRICS '90.

[6]  Robert B. Ross,et al.  Unifier: unifying cache management and communication buffer management for PVFS over InfiniBand , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[7]  Wu-chun Feng,et al.  The Quadrics network (QsNet): high-performance clustering technology , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[8]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[9]  Dhabaleswar K. Panda,et al.  Supporting efficient noncontiguous access in PVFS over Infiniband , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[10]  Nimrod Megiddo,et al.  ARC: A Self-Tuning, Low Overhead Replacement Cache , 2003, FAST.

[11]  Dhabaleswar K. Panda,et al.  High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[12]  Xiaoning Ding,et al.  DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality , 2005, FAST'05.

[13]  Dharmendra S. Modha,et al.  WOW: wise ordering for writes - combining spatial and temporal locality in non-volatile caches , 2005, FAST'05.

[14]  Gerhard Weikum,et al.  The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.

[15]  Jizhong Han,et al.  A fast read/write process to reduce RDMA communication latency , 2006, 2006 International Workshop on Networking, Architecture, and Storages (IWNAS'06).

[16]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[17]  David E. Culler,et al.  Hot Interconnects , 1995 .

[18]  Yuanyuan Zhou,et al.  Experiences with VI communication for database storage , 2002, ISCA.

[19]  Michelle Butler,et al.  A Scalable HTTP Server: The NCSA Prototype , 1994, Comput. Networks ISDN Syst..

[20]  John Wilkes,et al.  UNIX Disk Access Patterns , 1993, USENIX Winter.

[21]  Hiroshi Tezuka,et al.  Pin-down cache: a virtual memory management technique for zero-copy communication , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[22]  L. Iftode,et al.  Building a User-level Direct Access File System over Infiniband , 2004 .

[23]  Lakhmi C. Jain,et al.  Network and information security: A computational intelligence approach: Special Issue of Journal of Network and Computer Applications , 2007, J. Netw. Comput. Appl..

[24]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[25]  J. T. Robinson,et al.  Data cache management using frequency-based replacement , 1990, SIGMETRICS '90.

[26]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.