Building a high-performance key-value cache as an energy-efficient appliance

Abstract Key–value (KV) stores have become a critical infrastructure component supporting various services in the cloud. Long considered an application that is memory-bound and network-bound, recent KV-store implementations on multicore servers grow increasingly CPU-bound instead. This limitation often leads to under-utilization of available bandwidth and poor energy efficiency, as well as long response times under heavy load. To address these issues, we present Hippos , a high-throughput, low-latency, and energy-efficient key–value store implementation. Hippos moves the KV store into the operating system’s kernel and thus removes most of the overhead associated with the network stack and system calls. Hippos uses the Netfilter framework to quickly handle UDP packets, removing the overhead of UDP-based GET requests almost entirely. Combined with lock-free multithreaded data access, Hippos removes several performance bottlenecks both internal and external to the KV-store application. We prototyped Hippos as a Linux loadable kernel module and evaluated it against the ubiquitous Memcached using various micro-benchmarks and workloads from Facebook’s production systems. The experiments show that Hippos provides some 20%–200% throughput improvements on a 1 Gbps network (up to 590% improvement on a 10 Gbps network) and 5%–20% saving of power compared with Memcached .

[1]  Paul E. McKenney,et al.  READ-COPY UPDATE: USING EXECUTION HISTORY TO SOLVE CONCURRENCY PROBLEMS , 2002 .

[2]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[3]  Robert Tappan Morris,et al.  Improving network connection locality on multicore systems , 2012, EuroSys '12.

[4]  M. Frans Kaashoek,et al.  CPHASH: a cache-partitioned hash table , 2012, PPoPP '12.

[5]  Robert Tappan Morris,et al.  Locating cache performance bottlenecks using data profiling , 2010, EuroSys '10.

[6]  Eitan Frachtenberg,et al.  Power and performance evaluation of Memcached on the TILEPro64 architecture , 2012, Sustain. Comput. Informatics Syst..

[7]  Animesh Trivedi,et al.  Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft-RDMA to Boost Memcached , 2012, USENIX ATC.

[8]  Alan L. Cox,et al.  An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems , 2006, USENIX Annual Technical Conference, General Track.

[9]  Jonathan Walpole,et al.  Scalable concurrent hash tables via relativistic programming , 2010, OPSR.

[10]  Jamal Hadi Salim,et al.  Beyond Softnet , 2001, Annual Linux Showcase & Conference.

[11]  Amin Vahdat,et al.  Chronos: predictable low latency for data center applications , 2012, SoCC '12.

[12]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[13]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[14]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[15]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[16]  Jonathan Walpole,et al.  Exploiting deferred destruction: an analysis of read-copy-update techniques in operating system kernels , 2004 .

[17]  Hakim Weatherspoon,et al.  Operating Systems Abstractions for Software Packet Processing in Datacenters , 2011 .

[18]  Dennis Fowler,et al.  Net News , 1999, The Lancet.

[19]  Jonathan Walpole,et al.  Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming , 2011, USENIX ATC.

[20]  Muli Ben-Yehuda,et al.  IsoStack - Highly Efficient Network Processing on Dedicated Cores , 2010, USENIX Annual Technical Conference.

[21]  Eran Gabber,et al.  The Case Against User-Level Networking , 2004 .

[22]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[23]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[24]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[25]  Jeffrey S. Chase,et al.  End system optimizations for high-speed TCP , 2001, IEEE Commun. Mag..

[27]  Ali Heydari,et al.  High-efficiency server design , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).