Adaptive and scalable metadata management to support a trillion files

Nowadays more and more applications require file systems to efficiently maintain million or more files. How to provide high access performance with such a huge number of files and such large directories is a big challenge for cluster file systems. Limited by static directory structures, existing file systems will be prohibitively inefficient for this use. To address this problem, we present a scalable and adaptive metadata management system which aims to maintain a trillion files efficiently. Firstly, our system exploits an adaptive two-level directory partitioning based on extendible hashing to manage very large directories. Secondly, our system utilizes fine-grained parallel processing within a directory and greatly improves performance of file creation or deletion. Thirdly, our system uses multiple-layered metadata cache management which improves memory utilization on the servers. And finally, our system uses a dynamic loadbalance mechanism based on consistent hashing which enables our system to scale up and down easily. Our performance results on 32 metadata servers show that our user-level prototype implementation can create more than 74 thousand files per second and can get more than 270 thousand files' attributes per second in a single directory with 100 million files. Moreover, it delivers a peak throughput of more than 60 thousand file creates/second in a single directory with 1 billion files.

[1]  Garth A. Gibson,et al.  GIGA+ : Scalable Directories for Shared File Systems (CMU-PDL-08-110) , 2008 .

[2]  Hong Jiang,et al.  Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[3]  M. Frans Kaashoek,et al.  Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files , 1997, USENIX Annual Technical Conference.

[4]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[5]  Ohad Rodeh,et al.  zFS - a scalable distributed file system using object disks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[6]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7]  Samuel Lang,et al.  GIGA+: scalable directories for shared file systems , 2007, PDSW '07.

[8]  Dror G. Feitelson,et al.  The Vesta parallel file system , 1996, TOCS.

[9]  Jon Howell,et al.  Distributed directory service in the Farsite file system , 2006, OSDI '06.

[10]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[11]  GhemawatSanjay,et al.  The Google file system , 2003 .

[12]  Randy H. Katz,et al.  RAMA: An Easy-to-Use, High-Performance Parallel File System , 1997, Parallel Comput..

[13]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[14]  Carl Smith,et al.  NFS Version 3: Design and Implementation , 1994, USENIX Summer.

[15]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[16]  Daniel Phillips,et al.  A Directory Index for EXT2 , 2001, Annual Linux Showcase & Conference.

[17]  Hong Jiang,et al.  HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[18]  Michael J. Callahan,et al.  The InterMezzo File System , 1999 .

[19]  Philip A. Pinto,et al.  The Large Synoptic Survey Telescope , 2006 .