Modeling and Analyzing Latency in the Memcached system

Memcached is a widely used in-memory caching solution in large-scale searching scenarios. The most pivotal performance metric in Memcached is latency, which is affected by various factors including the workload pattern, the service rate, the unbalanced load distribution and the cache miss ratio. To quantitate the impact of each factor on latency, we establish a theoretical model for the Memcached system. Specially, we formulate the unbalanced load distribution among Memcached servers by a set of probabilities, capture the burst and concurrent key arrivals at Memcached servers in form of batching blocks, and add a cache miss processing stage. Based on this model, algebraic derivations are conducted to estimate latency in Memcached. The latency estimation is validated by intensive experiments. Moreover, we obtain a quantitative understanding of how much improvement of latency performance can be achieved by optimizing each factor and provide several useful recommendations to optimal latency in Memcached.

[1]  Christoforos E. Kozyrakis,et al.  Reconciling high server utilization and sub-millisecond quality-of-service , 2014, EuroSys '14.

[2]  Dhabaleswar K. Panda,et al.  Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[3]  Richard F. Serfozo,et al.  Sojourn times in G/M/1 fork‐join networks , 2008 .

[4]  Ludmila Cherkasova,et al.  Improving WWW Proxies Performance with Greedy-Dual- Size-Frequency Caching Policy , 1998 .

[5]  Animesh Trivedi,et al.  Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft-RDMA to Boost Memcached , 2012, USENIX ATC.

[6]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[7]  John N. Tsitsiklis,et al.  Delay, Memory, and Messaging Tradeoffs in Distributed Service Systems , 2018 .

[8]  K. K. Ramakrishnan,et al.  Load Balancing of Heterogeneous Workloads in Memcached Clusters , 2014, Feedback Computing.

[9]  Sachin Katti,et al.  Dynacache: Dynamic Cloud Caching , 2015, HotStorage.

[10]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[11]  Alan L. Cox,et al.  GD-Wheel: a cost-aware replacement policy for key-value stores , 2015, EuroSys.

[12]  Ling Liu,et al.  Achieving 10Gbps Line-rate Key-value Stores with FPGAs , 2013, HotCloud.

[13]  Dhabaleswar K. Panda,et al.  High performance RDMA-based design of HDFS over InfiniBand , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Brighten Godfrey,et al.  Low latency via redundancy , 2013, CoNEXT.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  J. Medhi,et al.  Stochastic models in queueing theory , 1991 .

[17]  Antony I. T. Rowstron,et al.  Software-defined caching: managing caches in multi-tenant data centers , 2015, SoCC.

[18]  Hjörtur Björnsson,et al.  Dynamic performance profiling of cloud caches , 2013, SoCC.

[19]  Thomas F. Wenisch,et al.  Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.

[20]  Randolph D. Nelson,et al.  Probability, stochastic processes, and queueing theory - the mathematics of computer performance modeling , 1995 .

[21]  Felix Poloczek,et al.  Computable Bounds in Fork-Join Queueing Systems , 2015, SIGMETRICS.

[22]  Asser N. Tantawi,et al.  Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.

[23]  Amin Vahdat,et al.  Chronos: predictable low latency for data center applications , 2012, SoCC '12.

[24]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[25]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[26]  Sayantan Sur,et al.  Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.

[27]  William J. Knottenbelt,et al.  Response Time Approximations in Fork-Join Queues , 2007 .

[28]  Sachin Katti,et al.  Cliffhanger: Scaling Performance Cliffs in Web Memory Caches , 2016, NSDI.

[29]  Armand M. Makowski,et al.  Interpolation Approximations for Symmetric Fork-Join Queues , 1994, Perform. Evaluation.