论文信息 - Adaptive and scalable metadata management to support a trillion files

Adaptive and scalable metadata management to support a trillion files

Nowadays more and more applications require file systems to efficiently maintain million or more files. How to provide high access performance with such a huge number of files and such large directories is a big challenge for cluster file systems. Limited by static directory structures, existing file systems will be prohibitively inefficient for this use. To address this problem, we present a scalable and adaptive metadata management system which aims to maintain a trillion files efficiently. Firstly, our system exploits an adaptive two-level directory partitioning based on extendible hashing to manage very large directories. Secondly, our system utilizes fine-grained parallel processing within a directory and greatly improves performance of file creation or deletion. Thirdly, our system uses multiple-layered metadata cache management which improves memory utilization on the servers. And finally, our system uses a dynamic loadbalance mechanism based on consistent hashing which enables our system to scale up and down easily. Our performance results on 32 metadata servers show that our user-level prototype implementation can create more than 74 thousand files per second and can get more than 270 thousand files' attributes per second in a single directory with 100 million files. Moreover, it delivers a peak throughput of more than 60 thousand file creates/second in a single directory with 1 billion files.

[1] Garth A. Gibson,et al. GIGA+ : Scalable Directories for Shared File Systems (CMU-PDL-08-110) , 2008 .

[2] Hong Jiang,et al. Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[3] M. Frans Kaashoek,et al. Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files , 1997, USENIX Annual Technical Conference.

[4] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[5] Ohad Rodeh,et al. zFS - a scalable distributed file system using object disks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[6] Scott A. Brandt,et al. Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7] Samuel Lang,et al. GIGA+: scalable directories for shared file systems , 2007, PDSW '07.

[8] Dror G. Feitelson,et al. The Vesta parallel file system , 1996, TOCS.

[9] Jon Howell,et al. Distributed directory service in the Farsite file system , 2006, OSDI '06.

[10] Andrew R. Cherenson,et al. The Sprite network operating system , 1988, Computer.

[11] GhemawatSanjay,et al. The Google file system , 2003 .

[12] Randy H. Katz,et al. RAMA: An Easy-to-Use, High-Performance Parallel File System , 1997, Parallel Comput..

[13] Ronald Fagin,et al. Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[14] Carl Smith,et al. NFS Version 3: Design and Implementation , 1994, USENIX Summer.

[15] Wei Hu,et al. Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[16] Daniel Phillips,et al. A Directory Index for EXT2 , 2001, Annual Linux Showcase & Conference.

[17] Hong Jiang,et al. HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[18] Michael J. Callahan,et al. The InterMezzo File System , 1999 .

[19] Philip A. Pinto,et al. The Large Synoptic Survey Telescope , 2006 .