A Non-Forced-Write Atomic Commit Protocol for Cluster File Systems

Distributed metadata consistency is one of the critical issues of metadata clusters in distributed file systems. Existing methods to maintain metadata consistency generally need several log forced write operations. Since synchronous disk IO is very inefficient, the average response time of metadata operations is greatly increased. In this paper, an asynchronous atomic commit protocol (ACP) named Dual-Log (DL) is presented. It does not need any log forced write operations. Optimizing for distributed metadata operations involving only two metadata servers, DL mutually records the redo log in counterpart metadata servers by transferring through the low latency network. A crashed metadata server can redo the metadata operation with the redundant redo log. Since the latency of the network is much lower than the latency of disk IO, DL can improve the performance of distributed metadata service significantly. The prototype of DL is implemented based on local journal. The performance is tested by comparing with two widely used protocols, EP and S2PC-MP, and the results show that the average response time of distributed metadata operations is reduced by about 40%~60%, and the recovery time is only 1 second under 10 thousands uncompleted distributed metadata operations.

[1]  W. G. Wood Recovery Control of Communicating Processes in a Distributed System , 1985 .

[2]  Panos K. Chrysanthis,et al.  Two-Phase Commit in Gigabit-Networked Distributed Databases , 1995 .

[3]  Yale N. Patt,et al.  Soft updates: a solution to the metadata update problem in file systems , 2000 .

[4]  Huang Hua,et al.  BWFS: A Distributed File System with Large Capacity, High Throughput and High Scalability , 2005 .

[5]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[6]  D.R. Kuhn IEEE's Posix: making progress , 1991, IEEE Spectrum.

[7]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[8]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[9]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[10]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[11]  Jin Xiong,et al.  Metadata Distribution and Consistency Techniques for Large-Scale Cluster File Systems , 2011, IEEE Transactions on Parallel and Distributed Systems.

[12]  Jim Gray A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem , 1986, Fault-Tolerant Distributed Computing.

[13]  Flaviu Cristian,et al.  A low-cost atomic commit protocol , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[14]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[15]  Margo I. Seltzer,et al.  Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems , 2000, USENIX Annual Technical Conference, General Track.

[16]  Gregory R. Ganger,et al.  Ursa minor: versatile cluster-based storage , 2005, FAST'05.

[17]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[18]  Bruce G. Lindsay,et al.  Transaction management in the R* distributed database management system , 1986, TODS.

[19]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[20]  Qiu Yuan A Low-Cost Distributed Database Log Mechanism , 2004 .