Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage

An efficient and distributed scheme for file mapping or file lookup scheme is critical in decentralizing metadata management within a group of metadata servers. This work presents a technique called HBA (hierarchical Bloom filter arrays) to map file names to the servers holding their metadata. Two levels of probabilistic arrays, i.e., Bloom filter arrays, with different accuracies are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, while the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Extensive trace-driven simulations have shown our HBA design to be highly effective and efficient in improving performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or superclusters).

[1]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[2]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[3]  Xiao Qin,et al.  Improved read performance in a cost-effective, fault-tolerant parallel virtual file system (CEFT-PVFS) , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[4]  Daniel Haj Carrere,et al.  Linux local and wide area network adapter support: (From 10 Mbps to Gigabit Ethernet, Token Ring, Frame Relay, and Slow Packet) , 2000, Int. J. Netw. Manag..

[5]  M. V. Ramakrishna,et al.  Practical performance of Bloom filters and parallel free-text searching , 1989, CACM.

[6]  Rick Floyd Short-Term File Reference Patterns in a UNIX Environment, , 1986 .

[7]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[8]  E. L. Miller,et al.  Efficient Metadata Management in Large Distributed File Systems , .

[9]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[10]  Tao Yang,et al.  An Efficient Data Location Protocol for Self.organizing Storage Clusters , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[11]  Xiao Qin,et al.  Scheduling for Improved Write Performance in a Cost- Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS) , 2003 .

[12]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[13]  Scott A. Brandt,et al.  Efficient metadata management in large distributed storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[14]  R. S. Fabry,et al.  A fast file system for UNIX , 1984, TOCS.

[15]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[16]  Dhabaleswar K. Panda,et al.  PVFS over InfiniBand: design and performance evaluation , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[17]  H. Apte,et al.  Serverless Network File Systems , 2006 .

[18]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[19]  GhemawatSanjay,et al.  The Google file system , 2003 .

[20]  H BloomBurton Space/time trade-offs in hash coding with allowable errors , 1970 .

[21]  Thomas Gross,et al.  Combining the concepts of compression and caching for a two-level filesystem , 1991 .

[22]  Mahadev Satyanarayanan,et al.  Andrew: a distributed personal computing environment , 1986, CACM.

[23]  Carl Hudson Staelin,et al.  High-performance file system design , 1992 .

[24]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[25]  Chris Eddington InfiniBridge: An InfiniBand Channel Adapter with Integrated Switch , 2002, IEEE Micro.

[26]  Sarang Dharmapurikar,et al.  Longest prefix matching using bloom filters , 2006, IEEE/ACM Transactions on Networking.

[27]  Thomas R. Gross,et al.  Integration of Compression and Caching for a Two-Level File System. , 1991, ASPLOS 1991.

[28]  Mahmut T. Kandemir,et al.  Discretionary Caching for I/O on Clusters , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[29]  Xiao Qin,et al.  A case study of parallel I/O for biological sequence search on Linux clusters , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[30]  Eric A. Brewer,et al.  Self-similarity in file systems , 1998, SIGMETRICS '98/PERFORMANCE '98.

[31]  María Engracia Gómez,et al.  Analysis of self-similarity in I/O workload using structural modeling , 1999, MASCOTS '99. Proceedings of the Seventh International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[32]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[33]  Xiao Qin,et al.  Design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system , 2003, SNAPI@PACT.

[34]  Mahadev Satyanarayanan,et al.  Coda: a highly available file system for a distributed workstation environment , 1989, Proceedings of the Second Workshop on Workstation Operating Systems.

[35]  Xiao Qin,et al.  A case study of parallel I/O for biological sequence search on Linux clusters , 2004, Int. J. High Perform. Comput. Netw..

[36]  Daniel Carrere Linux local and wide area network adapter support: (From 10 Mbps to Gigabit Ethernet, Token Ring, Frame Relay, and Slow Packet) , 2000, Int. J. Netw. Manag..

[37]  Robert Ross,et al.  Server-side scheduling in cluster parallel I/O systems , 2001 .

[38]  Carl Smith,et al.  NFS Version 3: Design and Implementation , 1994, USENIX Summer.