Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs

Cloud applications are increasingly relying on hundreds of loosely-coupled microservices to complete user requests that meet an application's end-to-end QoS requirements. Communication time between services accounts for a large fraction of the end-to-end latency and can introduce performance unpredictability and QoS violations. This letter presents our early work on Dagger, a hardware acceleration platform for networking, designed specifically with the unique qualities of microservices in mind. The Dagger architecture relies on an FPGA-based NIC, closely coupled with the processor over a configurable memory interconnect, designed to offload and accelerate RPC stacks. Unlike the traditional cloud systems that use PCIe links as the NIC I/O interface, we leverage memory-interconnected FPGAs as networking devices to provide the efficiency, transparency, and programmability needed for fine-grained microservices. We show that this considerably improves CPU utilization and performance for cloud RPCs.

[1]  Andrew W. Moore,et al.  Understanding PCIe performance for end host networking , 2018, SIGCOMM.

[2]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.

[3]  Monia Ghobadi,et al.  Beyond SmartNICs: Towards a Fully Programmable Cloud: Invited Paper , 2018, 2018 IEEE 19th International Conference on High Performance Switching and Routing (HPSR).

[4]  Rastislav Bodík,et al.  Floem: A Programming System for NIC-Accelerated Network Applications , 2018, OSDI.

[5]  Mendel Rosenblum,et al.  Network Interface Design for Low Latency Request-Response Protocols , 2013, USENIX ATC.

[6]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[7]  John K. Ousterhout,et al.  Homa: a receiver-driven low-latency transport protocol using network priorities , 2018, SIGCOMM.

[8]  Song Jiang,et al.  Characterizing Facebook's Memcached Workload , 2014, IEEE Internet Computing.

[9]  Thomas F. Wenisch,et al.  µTune: Auto-Tuned Threading for OLDI Microservices , 2018, OSDI.

[10]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[11]  Babak Falsafi,et al.  Scale-out NUMA , 2014, ASPLOS.

[12]  Arvind Krishnamurthy,et al.  E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers , 2019, USENIX ATC.

[13]  Robbert van Renesse,et al.  An analysis of Facebook photo caching , 2013, SOSP.

[14]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[15]  Mark Silberstein,et al.  NICA: An Infrastructure for Inline Acceleration of Network Applications , 2019, USENIX Annual Technical Conference.

[16]  Ling Liu,et al.  Achieving 10Gbps Line-rate Key-value Stores with FPGAs , 2013, HotCloud.

[17]  Nam Sung Kim,et al.  NetDIMM: Low-Latency Near-Memory Network Interface Architecture , 2019, MICRO.

[18]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[19]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[20]  Adel Javanmard,et al.  Analysis of DCTCP: stability, convergence, and fairness , 2011, PERV.

[21]  Babak Falsafi,et al.  Optimus Prime: Accelerating Data Transformation in Servers , 2020, ASPLOS.

[22]  Gustavo Alonso,et al.  Low-latency TCP/IP stack for data center applications , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[23]  Thomas F. Wenisch,et al.  SoftSKU: Optimizing Server Architectures for Microservice Diversity @Scale , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[24]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[25]  Michael M. Swift,et al.  Titan: Fair Packet Scheduling for Commodity Multiqueue NICs , 2017, USENIX ATC.

[26]  David Walker,et al.  Enabling Programmable Transport Protocols in High-Speed NICs , 2020, NSDI.

[27]  Thomas F. Wenisch,et al.  μ Suite: A Benchmark Suite for Microservices , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[28]  Babak Falsafi,et al.  The NEBULA RPC-Optimized Architecture , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).