HatRPC: Hint-Accelerated Thrift RPC over RDMA

In this paper, we propose a novel hint-accelerated Remote Procedure Call (RPC) framework based on Apache Thrift over Remote Direct Memory Access (RDMA) protocols, called HatRPC. HatRPC proposes a hierarchical hint scheme towards optimizing heterogeneous RPC services and functions. The proposed hint design is composed of service-granularity and function-granularity hints for achieving varied optimization goals and reducing design space for further optimizing the underneath RDMA communication engine. We co-design a key-value store called HatKV with HatRPC and LMDB. The effectiveness and efficiency of HatRPC are validated and evaluated with our proposed Apache Thrift Benchmarks (ATB), YCSB, and TPC-H workloads. Performance evaluations show that the proposed HatRPC approach can deliver up to 55% performance improvement for ATB benchmarks and up to 1.51X speedup for TPC-H queries compared with vanilla Thrift over IPoIB. In addition, the co-designed HatKV can achieve up to 85.5% improvement for YCSB workloads.

[1]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[2]  Dhabaleswar K. Panda,et al.  High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[3]  Dhabaleswar K. Panda,et al.  Accelerating TensorFlow with Adaptive RDMA-Based gRPC , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[4]  Henri Casanova,et al.  Overview of GridRPC: A Remote Procedure Call API for Grid Computing , 2002, GRID.

[5]  Dhabaleswar K. Panda,et al.  High-Performance Design of Hadoop RPC with RDMA over InfiniBand , 2013, 2013 42nd International Conference on Parallel Processing.

[6]  Dhabaleswar K. Panda,et al.  UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems , 2019, HPDC.

[7]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[8]  Carsten Binnig,et al.  The End of Slow Networks: It's Time for a Redesign , 2015, Proc. VLDB Endow..

[9]  Sayantan Sur,et al.  RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.

[10]  Dhabaleswar K. Panda,et al.  SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS , 2014, HPDC '14.

[11]  Transaction Processing Performance Council , 2019, Encyclopedia of Big Data Technologies.

[12]  Li Zhang,et al.  C-Hint: An Effective and Reliable Cache Management for RDMA-Accelerated Key-Value Stores , 2014, SoCC.

[13]  Dhabaleswar K. Panda,et al.  High-performance design of apache spark with RDMA and its benefits on various workloads , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[14]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[15]  Dhabaleswar K. Panda,et al.  DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[16]  Kang Chen,et al.  RFP: When RPC is Faster than Server-Bypass with RDMA , 2017, EuroSys.

[17]  Xiaoyi Lu,et al.  INEC: Fast and Coherent In-Network Erasure Coding , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Gagan Agrawal,et al.  A Framework for Elastic Execution of Existing MPI Programs , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[19]  Dhabaleswar K. Panda,et al.  MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters , 2017, J. Parallel Distributed Comput..

[20]  Surajit Chaudhuri,et al.  Interactive plan hints for query optimization , 2009, SIGMOD Conference.

[21]  Andreas Kipf,et al.  Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[22]  Wenting Han,et al.  Improving the Performance of Distributed TensorFlow with RDMA , 2017, International Journal of Parallel Programming.

[23]  Alfons Kemper,et al.  High-Speed Query Processing over High-Speed Networks , 2015, Proc. VLDB Endow..

[24]  Dhabaleswar K. Panda,et al.  Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[25]  Michael J. Lewis,et al.  Differential Deserialization for Optimized SOAP Performance , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[26]  Jason Maassen,et al.  Efficient Java RMI for parallel programming , 2001, TOPL.

[27]  M. Slee,et al.  Thrift : Scalable Cross-Language Services Implementation , 2022 .

[28]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[29]  Dhabaleswar K. Panda,et al.  Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[30]  Wei Zhang,et al.  iRDMA: Efficient Use of RDMA in Distributed Deep Learning Systems , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[31]  Xiaoyi Lu,et al.  TriEC: tripartite graph based erasure coding NIC offload , 2019, SC.

[32]  Michael Kaminsky,et al.  Using RDMA efficiently for key-value services , 2014, SIGCOMM.

[33]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[34]  Robert M. O'Bara,et al.  Dynamic Provisioning and Execution of HPC Workflows Using Python , 2016, 2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC).

[35]  Dhabaleswar K. Panda,et al.  High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[36]  Carsten Binnig,et al.  The End of a Myth: Distributed Transaction Can Scale , 2016, Proc. VLDB Endow..

[37]  Aamer Sachedina,et al.  Second-tier cache management using write hints , 2005, FAST'05.

[38]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[39]  Youyou Lu,et al.  Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing , 2019, EuroSys.

[40]  Surajit Chaudhuri,et al.  Power Hints for Query Optimization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[41]  Devarshi Ghoshal,et al.  E-HPC: a library for elastic resource management in HPC environments , 2017, WORKS@SC.

[42]  Dhabaleswar K. Panda,et al.  BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs , 2021, ISC.

[43]  Dhabaleswar K. Panda,et al.  High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[44]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[45]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.