DROP: Facilitating distributed metadata management in EB-scale storage systems

Efficient and scalable distributed metadata management is critically important to overall system performance in large-scale distributed storage systems, especially in the EB era. Traditional state-of-the-art distributed metadata management schemes include hash-based mapping and subtree partitioning. The former evenly distributes workload among metadata servers, but it eliminates all hierarchical locality of metadata. It cannot efficiently handle some operations, e.g., renaming or moving a directory that requires metadata to be migrated among metadata servers. The latter does not uniformly distribute workload among metadata servers, and metadata need to be migrated to keep the load balanced roughly. In this paper, we present a ring-based metadata management scheme, called Dynamic Ring Online Partitioning (DROP). It can preserve metadata locality using locality-preserving hashing, as well as dynamically distribute metadata among metadata server cluster to keep load balancing. By conducting performance evaluation, experimental results demonstrate the effectiveness and scalability of DROP.

[1]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[2]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[3]  Hector Garcia-Molina,et al.  Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems , 2004, VLDB.

[4]  Dror G. Feitelson,et al.  The Vesta parallel file system , 1996, TOCS.

[5]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[6]  Randy H. Katz,et al.  RAMA: An Easy-to-Use, High-Performance Parallel File System , 1997, Parallel Comput..

[7]  David R. Karger,et al.  Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems , 2004, IPTPS.

[8]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[9]  Richard M. Karp,et al.  Load Balancing in Structured P2P Systems , 2003, IPTPS.

[10]  Hong Jiang,et al.  HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[11]  Ian T. Foster,et al.  Making a case for distributed file systems at Exascale , 2011, LSAP '11.

[12]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[13]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[14]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[15]  Carl Smith,et al.  NFS Version 3: Design and Implementation , 1994, USENIX Summer.

[16]  Hong Jiang,et al.  Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems , 2011, IEEE Transactions on Parallel and Distributed Systems.

[17]  Srinivasan Seshan,et al.  Defragmenting DHT-based Distributed File Systems , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[18]  Ohad Rodeh,et al.  zFS - a scalable distributed file system using object disks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[19]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[20]  Sanjeev Khanna,et al.  A Polynomial Time Approximation Scheme for the Multiple Knapsack Problem , 2005, SIAM J. Comput..

[21]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[22]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[23]  Richard M. Karp,et al.  Load balancing in dynamic structured P2P systems , 2004, IEEE INFOCOM 2004.

[24]  Pla Information,et al.  Efficient Metadata Management in Large Distributed Storage Systems , 2008 .

[25]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[26]  Alan M. Frieze,et al.  Quick Approximation to Matrices and Applications , 1999, Comb..

[27]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[28]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[29]  Rodney Van Meter,et al.  Network attached storage architecture , 2000, CACM.

[30]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.