Ultra-Low-Latency and Flexible In-memory Key-Value Store System Design on CPU-FPGA

In-memory key-value store (KVS) is critical infrastructure in data centers and is facing challenges in performance and power consumption with the development of the big data technology, which mainly results from the low efficiency of the multi-level memory hierarchy of the CPU-based system. Remote direct memory access (RDMA) technology partly alleviates the problems, but it is still not efficient for KVS, especially for the PUT operation. In this paper, we present an ultra-low-latency and flexible in-memory KVS system based on the CPU-FPGA heterogeneous architecture, which leverages FPGA to serve as a KVS accelerator. We design a highly parallel accelerator architecture with several novel techniques, including memory pre-allocation, fragmentation processing, and decoupling design, to achieve ultra-low latency, high flexibility, efficiency, and scalability. The system workload can scale up with the storage capacity due to the decoupling design which stores the hash table in onboard DRAM memory and values in the host memory. For each KVS operation, at most one PCIe DMA is needed, which achieves high efficiency. Compared with current hardware-based KVS systems, the proposed one is more flexible, where the supported value range is 4x wider (from 1 byte to 4M bytes). In 10Gbps Ethernet, the peak throughput of the system can reach 13.6 million key-value operations per second (Mops), achieving nearly full utilization of the Ethernet bandwidth. The system latency can achieve as low as 1.2us for the PUT operation and 1.7us for the GET operation, which is 3.8x and 2.0x faster respectively than current state-of-the-art KVS systems.

[1]  Maurice Herlihy,et al.  Hopscotch Hashing , 2008, DISC.

[2]  Gustavo Alonso,et al.  Caribou: Intelligent Distributed Storage , 2017, Proc. VLDB Endow..

[3]  Ling Liu,et al.  Achieving 10Gbps Line-rate Key-value Stores with FPGAs , 2013, HotCloud.

[4]  Jan Korenek,et al.  Low latency book handling in FPGA for high frequency trading , 2014, 17th International Symposium on Design and Diagnostics of Electronic Circuits & Systems.

[5]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[6]  Wei Liang,et al.  Memory efficient and high performance key-value store on FPGA using Cuckoo hashing , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[7]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[8]  Haibo Chen,et al.  Fast In-Memory Transaction Processing Using RDMA and HTM , 2017, ACM Trans. Comput. Syst..

[9]  John W. Lockwood,et al.  Implementing Ultra Low Latency Data Center Services with Programmable Logic , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[10]  Gustavo Alonso,et al.  A Hash Table for Line-Rate Data Processing , 2015, TRETS.

[11]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[12]  Sungjin Lee,et al.  BlueCache: A Scalable Distributed Flash-based Key-value Store , 2016, Proc. VLDB Endow..

[13]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[14]  Gustavo Alonso,et al.  A flexible hash table design for 10GBPS key-value stores on FPGAS , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[15]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[16]  Kevin D Hsiue FPGA-based hardware acceleration for a key-value store database , 2014 .

[17]  Martin Margala,et al.  An FPGA memcached appliance , 2013, FPGA '13.

[18]  Enhong Chen,et al.  KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC , 2017, SOSP.

[19]  Yuan Yuan,et al.  Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores , 2015, Proc. VLDB Endow..

[20]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.