HCCache: A Hybrid Client-Side Cache Management Scheme for I/O-intensive Workloads in Network-Based File Systems

Client-side caching is an effective technique to improve I/O performance in network-based file systems. However, current block-indexed caching structure suffers from cache efficiency problem under high concurrency environment, especially for small files workloads. In this paper, we present a hybrid client-side caching (HCCache) scheme to avoid performance degradation caused by the block interleaving problem and increase the efficiency of cache data management by customizing content addressable level for files with different sizes. Two new metrics are also proposed to accurately evaluate the cache efficiency based on the analysis of shortcomings of hit rate metrics. Extensive simulations show the I/O performance of small files with HCCache can be improved by factors of 34.2 and 6.1 percent in terms of aggregate I/O bandwidth and access latency, respectively. Meanwhile, HCCache can significantly reduce the lookup times of content addressable data blocks and improve the access latency for small files.

[1]  スタンフィル,クレッグ,et al.  Parallel virtual file system , 1998 .

[2]  Limin Xiao,et al.  CEFLS: A Cost-Effective File Lookup Service in a Distributed Metadata File System , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[3]  Aric D. Blumer,et al.  The Parallel Virtual File System , 1994 .

[4]  Mahmut T. Kandemir,et al.  Virtual I/O caching: Dynamic storage cache management for concurrent workloads , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Anand Sivasubramaniam,et al.  Evaluating the usefulness of content addressable storage for high-performance data intensive applications , 2008, HPDC '08.

[6]  Feng Wang,et al.  File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .

[7]  Bradley W. Settlemyer,et al.  A study of client-based caching for parallel i/o , 2009 .

[8]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Daniel A. Reed,et al.  Learning to Classify Parallel Input/Output Access Patterns , 2002, IEEE Trans. Parallel Distributed Syst..

[10]  Gregory R. Ganger,et al.  Argon: Performance Insulation for Shared Storage Servers , 2007, FAST.

[11]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[12]  Renato Figueiredo,et al.  Towards simulation of parallel file system scheduling algorithms with PFSsim , 2011 .

[13]  Arif Merchant,et al.  TaP: Table-based Prefetching for Storage Caches , 2008, FAST.

[14]  Robert Latham,et al.  I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[15]  Hong Jiang,et al.  HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[16]  Xian-He Sun,et al.  Reevaluating Amdahl's law in the multicore era , 2010, J. Parallel Distributed Comput..

[17]  Francieli Zanon Boito,et al.  Improving Performance on Atmospheric Models through a Hybrid OpenMP/MPI Implementation , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[18]  Wei-keng Liao,et al.  Collective caching: application-aware client-side file caching , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[19]  Athena Vakali Evolutionary Techniques for Web Caching , 2004, Distributed and Parallel Databases.

[20]  Thomas Ludwig,et al.  Dynamic file system semantics to enable metadata optimizations in PVFS , 2009, Concurr. Comput. Pract. Exp..

[21]  Mahmut T. Kandemir,et al.  Discretionary Caching for I/O on Clusters , 2006, Cluster Computing.

[22]  Xiaodan Wang,et al.  A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching , 2007, DASFAA.

[23]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[24]  Robert B. Ross,et al.  Small-file access in parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.