Agar: A Caching System for Erasure-Coded Data

Erasure coding is an established data protection mechanism. It provides high resiliency with low storage overhead, which makes it very attractive to storage systems developers. Unfortunately, when used in a distributed setting, erasure coding hampers a storage system's performance, because it requires clients to contact several, possibly remote sites to retrieve their data. This has hindered the adoption of erasure coding in practice, limiting its use to cold, archival data. Recent research showed that it is feasible to use erasure coding for hot data as well, thus opening new perspectives for improving erasure-coded storage systems. In this paper, we address the problem of minimizing access latency in erasure-coded storage. We propose Agar-a novel caching system tailored for erasure-coded content. Agar optimizes the contents of the cache based on live information regarding data popularity and access latency to different data storage sites. Our system adapts a dynamic programming algorithm to optimize the choice of data blocks that are cached, using an approach akin to "Knapsack" algorithms. We compare Agar to the classical Least Recently Used and Least Frequently Used cache eviction policies, while varying the amount of data cached between a data chunk and a whole replica of the object. We show that Agar can achieve 16% to 41% lower latency than systems that use classical caching policies.

[1]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[2]  Flavio Figueiredo,et al.  The tube over time: characterizing popularity growth of youtube videos , 2011, WSDM '11.

[3]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[4]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[5]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[6]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[7]  Yahiko Kambayashi,et al.  LRU-SP: a size-adjusted and popularity-aware LRU replacement algorithm for web caching , 2000, Proceedings 24th Annual International Computer Software and Applications Conference. COMPSAC2000.

[8]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[9]  Yu Xiang,et al.  Sprout: A Functional Caching Approach to Minimize Service Latency in Erasure-Coded Storage , 2016, IEEE/ACM Transactions on Networking.

[10]  Roy Friedman,et al.  TinyLFU: A Highly Efficient Cache Admission Policy , 2014, PDP.

[11]  Kannan Ramchandran,et al.  EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[12]  Mario Blaum,et al.  A Tale of Two Erasure Codes in HDFS , 2015, FAST.

[13]  Suman Banerjee,et al.  An ensemble of replication and erasure codes for cloud file systems , 2013, 2013 Proceedings IEEE INFOCOM.

[14]  Philip S. Yu,et al.  Caching on the World Wide Web , 1999, IEEE Trans. Knowl. Data Eng..

[15]  George Karakostas,et al.  Exploitation of different types of locality for Web caches , 2002, Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications.

[16]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[17]  J. Spencer Love,et al.  Caching strategies to improve disk system performance , 1994, Computer.

[18]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[19]  Guthemberg Silvestre,et al.  Designing Adaptive Replication Schemes for Efficient Content Delivery in Edge Networks , 2013 .

[20]  Kianoosh Mokhtarian,et al.  Caching in video CDNs: building strong lines of defense , 2014, EuroSys '14.

[21]  Ludmila Cherkasova,et al.  Improving WWW Proxies Performance with Greedy-Dual- Size-Frequency Caching Policy , 1998 .

[22]  Azer Bestavros,et al.  GreedyDual* Web caching algorithm: exploiting the two sources of temporal locality in Web request streams , 2001, Comput. Commun..

[23]  Luigi Rizzo,et al.  Replacement policies for a proxy cache , 2000, TNET.