论文信息 - HatRPC: Hint-Accelerated Thrift RPC over RDMA

HatRPC: Hint-Accelerated Thrift RPC over RDMA

In this paper, we propose a novel hint-accelerated Remote Procedure Call (RPC) framework based on Apache Thrift over Remote Direct Memory Access (RDMA) protocols, called HatRPC. HatRPC proposes a hierarchical hint scheme towards optimizing heterogeneous RPC services and functions. The proposed hint design is composed of service-granularity and function-granularity hints for achieving varied optimization goals and reducing design space for further optimizing the underneath RDMA communication engine. We co-design a key-value store called HatKV with HatRPC and LMDB. The effectiveness and efficiency of HatRPC are validated and evaluated with our proposed Apache Thrift Benchmarks (ATB), YCSB, and TPC-H workloads. Performance evaluations show that the proposed HatRPC approach can deliver up to 55% performance improvement for ATB benchmarks and up to 1.51X speedup for TPC-H queries compared with vanilla Thrift over IPoIB. In addition, the co-designed HatKV can achieve up to 85.5% improvement for YCSB workloads.

[1] Miguel Castro,et al. No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[2] Dhabaleswar K. Panda,et al. High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[3] Dhabaleswar K. Panda,et al. Accelerating TensorFlow with Adaptive RDMA-Based gRPC , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[4] Henri Casanova,et al. Overview of GridRPC: A Remote Procedure Call API for Grid Computing , 2002, GRID.

[5] Dhabaleswar K. Panda,et al. High-Performance Design of Hadoop RPC with RDMA over InfiniBand , 2013, 2013 42nd International Conference on Parallel Processing.

[6] Dhabaleswar K. Panda,et al. UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems , 2019, HPDC.

[7] David G. Andersen,et al. Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[8] Carsten Binnig,et al. The End of Slow Networks: It's Time for a Redesign , 2015, Proc. VLDB Endow..

[9] Sayantan Sur,et al. RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.

[10] Dhabaleswar K. Panda,et al. SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS , 2014, HPDC '14.

[11] Transaction Processing Performance Council , 2019, Encyclopedia of Big Data Technologies.

[12] Li Zhang,et al. C-Hint: An Effective and Reliable Cache Management for RDMA-Accelerated Key-Value Stores , 2014, SoCC.

[13] Dhabaleswar K. Panda,et al. High-performance design of apache spark with RDMA and its benefits on various workloads , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[14] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[15] Dhabaleswar K. Panda,et al. DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[16] Kang Chen,et al. RFP: When RPC is Faster than Server-Bypass with RDMA , 2017, EuroSys.

[17] Xiaoyi Lu,et al. INEC: Fast and Coherent In-Network Erasure Coding , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18] Gagan Agrawal,et al. A Framework for Elastic Execution of Existing MPI Programs , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[19] Dhabaleswar K. Panda,et al. MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters , 2017, J. Parallel Distributed Comput..

[20] Surajit Chaudhuri,et al. Interactive plan hints for query optimization , 2009, SIGMOD Conference.

[21] Andreas Kipf,et al. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[22] Wenting Han,et al. Improving the Performance of Distributed TensorFlow with RDMA , 2017, International Journal of Parallel Programming.

[23] Alfons Kemper,et al. High-Speed Query Processing over High-Speed Networks , 2015, Proc. VLDB Endow..

[24] Dhabaleswar K. Panda,et al. Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[25] Michael J. Lewis,et al. Differential Deserialization for Optimized SOAP Performance , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[26] Jason Maassen,et al. Efficient Java RMI for parallel programming , 2001, TOPL.

[27] M. Slee,et al. Thrift : Scalable Cross-Language Services Implementation , 2022 .

[28] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[29] Dhabaleswar K. Panda,et al. Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[30] Wei Zhang,et al. iRDMA: Efficient Use of RDMA in Distributed Deep Learning Systems , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[31] Xiaoyi Lu,et al. TriEC: tripartite graph based erasure coding NIC offload , 2019, SC.

[32] Michael Kaminsky,et al. Using RDMA efficiently for key-value services , 2014, SIGCOMM.

[33] Tao Li,et al. Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[34] Robert M. O'Bara,et al. Dynamic Provisioning and Execution of HPC Workflows Using Python , 2016, 2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC).

[35] Dhabaleswar K. Panda,et al. High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[36] Carsten Binnig,et al. The End of a Myth: Distributed Transaction Can Scale , 2016, Proc. VLDB Endow..

[37] Aamer Sachedina,et al. Second-tier cache management using write hints , 2005, FAST'05.

[38] Miguel Castro,et al. FaRM: Fast Remote Memory , 2014, NSDI.

[39] Youyou Lu,et al. Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing , 2019, EuroSys.

[40] Surajit Chaudhuri,et al. Power Hints for Query Optimization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[41] Devarshi Ghoshal,et al. E-HPC: a library for elastic resource management in HPC environments , 2017, WORKS@SC.

[42] Dhabaleswar K. Panda,et al. BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs , 2021, ISC.

[43] Dhabaleswar K. Panda,et al. High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[44] Jinyang Li,et al. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[45] Michael Kaminsky,et al. Datacenter RPCs can be General and Fast , 2018, NSDI.