Nexus: a novel weighted-graph-based prefetching algorithm for metadata servers in petabyte-scale storage systems

An efficient, accurate and distributed metadata-oriented prefetching scheme is critical to the overall performance in large distributed storage systems. In this paper, we present a novel weighted-graph-based prefetching technique, built on successor relationship, to gain performance benefit from prefetching specifically for clustered metadata servers, an arrangement envisioned necessary for petabyte-scale distributed storage systems. Extensive trace-driven simulations show that by adopting our new prefetching algorithm, the hit rate for metadata access on the client site can be increased by up to 13%, while the average response time of metadata operations can be reduced by up to 67%, compared with LRU and an existing state of the art prefetching algorithm.

[1]  Michel Dubois,et al.  Compiler Controlled Prefetching for Multiprocessors Using Low-Overhead Traps and Prefetch Engines , 2000, J. Parallel Distributed Comput..

[2]  Mostafa H. Ammar,et al.  A novel multicast scheduling scheme for multimedia servers with variable access patterns , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[3]  Darrell D. E. Long,et al.  Noah: low-cost file access prediction through pairs , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[4]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[5]  Randal C. Burns,et al.  Group-based management of distributed file caches , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[6]  Doron Rotem,et al.  Optimal File-Bundle Caching Algorithms for Data-Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[8]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[9]  Jun Wang,et al.  UCFS-A Novel User-Space, High Performance, Customized File System for Web Proxy Servers , 2002, IEEE Trans. Computers.

[10]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[11]  Donald E. Knuth,et al.  An Analysis of Optimum Caching , 1985, J. Algorithms.

[12]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[13]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[14]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[15]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[16]  Michael Dahlin,et al.  Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[17]  Paul Sikalinda Analyzing storage system workloads , 2006 .

[18]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[19]  Hong Jiang,et al.  Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[20]  GhemawatSanjay,et al.  The Google file system , 2003 .

[21]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.