Towards Tunable RDMA Parameter Selection at Runtime for Datacenter Applications

Because of the low-latency and high-throughput benefits of RDMA, an increasing number of collaborative applications in datacenters are re-designed with RDMA to boost the performance. Among various low-level hardware primitives provided by RDMA, exposed as parameters of APIs, the application designers select and hardcode them to exploit all the performance benefits of RDMA. However, with the dynamic nature of datacenter application, the hardcoded and fixed parameter selection fails to take full advantages of RDMA capabilities, which can cause up to 35% throughput performance loss. To address this issue, we present a tunable RDMA parameter selection framework, which allows parameter tuning at runtime, adaptive to the dynamic application and server status. To attain the native RDMA performance, we use a lightweight decision tree to reduce the overhead of RDMA parameter selection. Finally, we implement the tunable RDMA parameter selection framework with native RDMA API to provide a more abstract API. To demonstrate the effectiveness of our method, we implement a key-value service based on the abstract API. Experiment results show that our implementation has only a very small overhead compared with the native RDMA, while the optimized key-value service achieves 112% more throughput than Pilaf and 66% more throughput than FaRM.

[1]  Feng Li,et al.  Accelerating Relational Databases by Leveraging Remote Memory and RDMA , 2016, SIGMOD Conference.

[2]  Yiying Zhang,et al.  LITE Kernel RDMA Support for Datacenter Applications , 2017, SOSP.

[3]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[4]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[5]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[6]  Michael Kaminsky,et al.  Using RDMA efficiently for key-value services , 2014, SIGCOMM.

[7]  Kang Chen,et al.  RFP: When RPC is Faster than Server-Bypass with RDMA , 2017, EuroSys.

[8]  Gang Chen,et al.  Efficient Distributed Memory Management with RDMA and Caching , 2018, Proc. VLDB Endow..

[9]  Wei Xu,et al.  Improving Spark performance with zero-copy buffer management and RDMA , 2016, 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[10]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[11]  Kang G. Shin,et al.  Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[12]  Dhabaleswar K. Panda,et al.  Accelerating Spark with RDMA for Big Data Processing: Early Experiences , 2014, 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects.

[13]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[14]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.