Optimization of Index-Based Method of Metadata Search for Large-Scale File Systems

With the vigorous development of the information era, the volume of data in file systems has grown to TB or EB. Metadata search determines the speed of retrieving required files in such large-scale file systems. There are many researches on metadata search based on different theories, and have made great achievements. Some researches show that the distribution of metadata presents sub-tree locality and horizontal locality, and metadata search presents heavy-tailed distribution. However, current methods do not make full use of the spatial locality and load characteristics in metadata search. In this paper, we put forward a metadata search method based on keyword-index. Our method fully exploits characteristics of metadata distribution and metadata search by reasonably partitioning indexes. The experimental result shows that our partitioning method is more efficient than current non-partitioning method.

[1]  Yang Yu,et al.  Distributed Metadata Search for the Cloud , 2016, J. Commun..

[2]  Michael A. Olson,et al.  The Design and Implementation of the Inversion File System , 1993, USENIX Winter.

[3]  E. L. Miller,et al.  Magellan : A Searchable Metadata Architecture for Large-Scale File Systems Technical Report UCSC-SSRC-09-07 November 2009 , 2009 .

[4]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[5]  Andrea C. Arpaci-Dusseau,et al.  Generating realistic impressions for file-system benchmarking , 2009, TOS.

[6]  Alexander S. Szalay,et al.  Just-in-Time Analytics on Large File Systems , 2011, IEEE Transactions on Computers.

[7]  Hong Jiang,et al.  A Novel Weighted-Graph-Based Grouping Algorithm for Metadata Prefetching , 2010, IEEE Transactions on Computers.

[8]  Yifeng Zhu,et al.  Rapport: Semantic-sensitive Namespace Management in Large-scale File Systems , 2010 .

[9]  Shankar Pasupathy,et al.  Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems , 2009, FAST.

[10]  Limin Xiao,et al.  MBFS: a parallel metadata search method based on Bloomfilters using MapReduce for large-scale file systems , 2015, The Journal of Supercomputing.

[11]  Hong Jiang,et al.  SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[12]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[13]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[14]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[15]  Yu Hua,et al.  Propeller: A Scalable Metadata Organization for A Versatile Searchable File System , 2011 .

[16]  A. Sutoh,et al.  Event-notification-based inactive file search for large-scale file systems , 2012, 2012 Digest APMRC.