Cx: Concurrent Execution for the Cross-Server Operations in a Distributed File System

Distributed metadata service is important for metadata intensive applications. Unfortunately, it leads to cross-server file operation, and maintaining the consistency of cross-server file operation creates a performance challenge because of sequentially executed sub-operations and costly immediate commitment among servers. In this paper, we observe that sub-operations can be executed concurrently and commitments can be delayed and batched for most cases in real applications, because the temporary inconsistency among servers rarely affects subsequent metadata operations. We propose a new protocol, Cx, in which the affected servers Concurrently eXecute the sub-operations of a cross-server file operation, and respond immediately to a client. Unless any sub-operation fails or other clients need to access the updated metadata objects, the commitment is delayed and batched with the other commitments. Evaluations of our Cx implementation in a parallel file system demonstrate Cx can significantly improve the performance of cross-server file operations, while retaining good scalability.

[1]  Gregory R. Ganger,et al.  A Transparently-Scalable Metadata Service for the Ursa Minor Storage System , 2010, USENIX Annual Technical Conference.

[2]  Jin Xiong,et al.  Metadata Distribution and Consistency Techniques for Large-Scale Cluster File Systems , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[4]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[5]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[6]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[7]  Edward W. Felten,et al.  Archipelago: an Island-based file system for highly available and scalable internet services , 2000 .

[8]  Zheng Zhang,et al.  Designing a robust namespace for distributed file services , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[9]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[10]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[11]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[12]  Werner Vogels,et al.  File system usage in Windows NT 4.0 , 1999, SOSP.

[13]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[14]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[15]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[16]  Peter Druschel,et al.  Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O , 2001, SOSP.

[17]  Amin Vahdat,et al.  Interposed request routing for scalable network storage , 2000, TOCS.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  GhemawatSanjay,et al.  The Google file system , 2003 .