Popularity-based full replica caching for erasure-coded distributed storage systems

In most storage systems, the storage nodes store data on a local filesystem. Thus, unless they have a dedicated caching layer, they benefit from the usual filesystem cache in the host’s free memory. However, in erasure-coded storage systems, caching is effective only if all the systematic fragments corresponding to an object are in the cache. In this work, we propose a new caching policy adapting traditional methods to erasure-coded storage systems. The main idea of our solution is to cache a full object rather than fragments object. A simulation-based evaluation showed that our full replica solution is able to improve the cache hit ratio and reduce the cache waste ratio compared to the traditional caching method. Moreover, experimental evaluation has been conducted. It indicates that our implementation not only validates the previous results but also shows that cache hits on full replicas have a better request response time.

[1]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[2]  Joel J. P. C. Rodrigues,et al.  Energy and delay efficient fog computing using caching mechanism , 2020, Comput. Commun..

[3]  Yu Xiang,et al.  Sprout: A Functional Caching Approach to Minimize Service Latency in Erasure-Coded Storage , 2016, IEEE/ACM Transactions on Networking.

[4]  Tengyue Mao,et al.  Bloom-filter-based request node collaboration caching for named data networking , 2018, Cluster Computing.

[5]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[6]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.

[7]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[8]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[9]  Kannan Ramchandran,et al.  EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[10]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[11]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[12]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[13]  Hesham A. Ali,et al.  Effective cache replacement strategy (ECRS) for real-time fog computing environment , 2020, Cluster Computing.

[14]  Jean-Louis Rougier,et al.  Collapsing the layers: 6Stor, a scalable and IPv6-centric distributed storage system , 2017, 2017 Fourth International Conference on Software Defined Systems (SDS).

[15]  Kyoung Soo Bok,et al.  An efficient distributed caching for accessing small files in HDFS , 2017, Cluster Computing.

[16]  Herodotos Herodotou,et al.  AutoCache: Employing Machine Learning to Automate Caching in Distributed File Systems , 2019, 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW).

[17]  GhemawatSanjay,et al.  The Google file system , 2003 .

[18]  Leila Azouz Saidane,et al.  FCR-NS: a novel caching and forwarding strategy for Named Data Networking based on Software Defined Networking , 2019, Cluster Computing.

[19]  Vaneet Aggarwal,et al.  TTLCache: Taming Latency in Erasure-Coded Storage Through TTL Caching , 2020, IEEE Transactions on Network and Service Management.

[20]  Emina Soljanin,et al.  On the Delay-Storage Trade-Off in Content Download from Coded Distributed Storage Systems , 2013, IEEE Journal on Selected Areas in Communications.

[21]  Akshat Verma,et al.  C2P: Co-operative Caching in Distributed Storage Systems , 2014, ICSOC.

[22]  Wei Wang,et al.  Achieving Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition , 2020, IEEE Transactions on Parallel and Distributed Systems.

[23]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.