NbQ-CLOCK: A Non-blocking Queue-based CLOCK Algorithm for Web-Object Caching

Major Internet-based service providers rely on high-throughput web-object caches to serve millions of daily accesses to frequently viewed web content. A web-object cache’s ability to reduce user access time is dependent on its replacement algorithm and the cache hit rate it yields. In this report, I present NbQ-CLOCK, a novel, lock-free variant of the Generalized CLOCK algorithm particularly suited for web-object caching. NbQ-CLOCK is based on an unbounded non-blocking queue with no internal dynamic memory management, instead of the traditional circular buffer. My solution benefits from Generalized CLOCK’s low-latency updates and high hit rates, and its non-blocking implementation makes it scalable with only 10 bytes per-object space overhead. I compare the solution to existing algorithms, including Intel’s Bag-LRU, and demonstrate that NbQ-CLOCK’s fast update operation scales well with the number of threads and in a in-memory key-value store prototype, NbQ-CLOCK offers an overall throughput improvement of as much as 9.20% over the best of the other algorithms. In addition, NbQ-CLOCK’s hit rate exceeds the next best algorithm’s hit rate by as much as 1.40%.

[1]  John L. Hennessy,et al.  WSCLOCK—a simple and effective algorithm for virtual memory management , 1981, SOSP.

[2]  Alan Jay Smith,et al.  Sequentiality and prefetching in database systems , 1978, TODS.

[3]  Dhabaleswar K. Panda,et al.  Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[4]  Hayato Yamana,et al.  Nb-GCLOCK: A non-blocking buffer management based on the generalized CLOCK , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[5]  Christoforos E. Kozyrakis,et al.  Reconciling high server utilization and sub-millisecond quality-of-service , 2014, EuroSys '14.

[6]  Daniel G. Waddington,et al.  KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore , 2013, 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing.

[7]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[8]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[9]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[10]  Thomas F. Wenisch,et al.  Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.

[11]  K. H. Kim,et al.  Efficient Adaptations of the Non-Blocking Buffer for Event Message Communication between Real-Time Threads , 2007, 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'07).

[12]  Amin Vahdat,et al.  Chronos: predictable low latency for data center applications , 2012, SoCC '12.

[13]  Nick Feamster,et al.  Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation , 2013 .

[14]  Dharmendra S. Modha,et al.  CAR: Clock with Adaptive Replacement , 2004, FAST.

[15]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[16]  Song Jiang,et al.  CLOCK-Pro: An Effective Improvement of the CLOCK Replacement , 2005, USENIX Annual Technical Conference, General Track.

[17]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.