AMP: An Affinity-Based Metadata Prefetching Scheme in Large-Scale Distributed Storage Systems

Prefetching is an effective technique for improving file access performance, which can significantly reduce access latency for I/O systems. In distributed storage systems, prefetching for metadata files is critical for the overall system performance. In this paper, an affinity-based metadata prefetching (AMP) scheme is proposed for metadata servers in large-scale distributed storage systems to provide aggressive metadata prefetching. Through mining useful information about metadata accesses from past history, AMP can discover metadata file affinities accurately and intelligently for prefetching. Compared with LRU and some of the latest file prefetching algorithms such as Nexus and C-Miner, our trace-driven simulations show that AMP can improve buffer cache hit rates by up to 12%, 4.5% and 4% respectively, while reduce the average response time by up to 60%, 12% and 8%, respectively.

[1]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[2]  Andrew Tomkins,et al.  Informed multi-process prefetching and caching , 1997, SIGMETRICS '97.

[3]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[4]  Brian N. Bershad,et al.  A trace-driven comparison of algorithms for parallel prefetching and caching , 1996, OSDI '96.

[5]  Hui Lei,et al.  Intelligent file hoarding for mobile computers , 1995, MobiCom '95.

[6]  Hong Jiang,et al.  Nexus: a novel weighted-graph-based prefetching algorithm for metadata servers in petabyte-scale storage systems , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[7]  Geoffrey H. Kuenning,et al.  Automated hoarding for mobile computers , 1997, SOSP.

[8]  David Mazières,et al.  A low-bandwidth network file system , 2001, SOSP.

[9]  Yuanyuan Zhou,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Thomas M. Kroeger,et al.  Predicting file system actions from prior events , 1996 .

[12]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[13]  Todd C. Mowry,et al.  Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.

[14]  Mostafa H. Ammar,et al.  A novel multicast scheduling scheme for multimedia servers with variable access patterns , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[15]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[16]  Kai Li,et al.  Application-Controlled File Caching Policies , 1994, USENIX Summer.

[17]  Hai Jin,et al.  The Zebra Striped Network File System , 2002 .

[18]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[19]  Hong Jiang,et al.  Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[20]  GhemawatSanjay,et al.  The Google file system , 2003 .

[21]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[22]  Garth A. Gibson,et al.  Automatic I/O hint generation through speculative execution , 1999, OSDI '99.

[23]  Anna R. Karlin,et al.  A study of integrated prefetching and caching strategies , 1995, SIGMETRICS '95/PERFORMANCE '95.

[24]  Yale N. Patt,et al.  System-oriented evaluation of I/O subsystem performance , 1995 .

[25]  Margo I. Seltzer,et al.  New NFS Tracing Tools and Techniques for System Analysis , 2003, LISA.

[26]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[27]  Todd C. Mowry,et al.  Compiler-based I/O prefetching for out-of-core applications , 2001, TOCS.

[28]  Valery Soloviev Prefetching in segmented disk cache for multi-disk systems , 1996, IOPADS '96.

[29]  Geoffrey H. Kuenning,et al.  The Design of the SEER Predictive Caching System , 1994, 1994 First Workshop on Mobile Computing Systems and Applications.

[30]  Song Jiang,et al.  STEP: Sequentiality and Thrashing Detection Based Prefetching to Improve Performance of Networked Storage Servers , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[31]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[32]  Doron Rotem,et al.  Optimal File-Bundle Caching Algorithms for Data-Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[33]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[34]  Darrell D. E. Long,et al.  Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..