Symmetric active/active metadata service for highly available cluster storage systems

In a typical distributed storage system, metadata is stored and managed by dedicated metadata servers. One way to improve the availability of distributed storage systems is to deploy multiple metadata servers. Past research focused on the active/standby model, where each active server has at least one redundant idle backup. However, interruption of service and loss of service state may occur during a fail-over depending on the used replication technique. The research in this paper targets the symmetric active/active replication model using multiple redundant service nodes running in virtual synchrony. In this model, service node failures do not cause a fail-over to a backup and there is no disruption of service or loss of service state. We use a fast delivery protocol to reduce the latency of total order broadcast. Our prototype implementation shows that high availability of metadata servers can be achieved with an acceptable performance trade-off using the active/active metadata server solution.

[1]  Louise E. Moser,et al.  Extended virtual synchrony , 1994, 14th International Conference on Distributed Computing Systems.

[2]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[3]  Louise E. Moser,et al.  Overview of the InterGroup Protocols , 2001, International Conference on Computational Science.

[4]  Xubin He,et al.  A Fast Delivery Protocol for Total Order Broadcasting , 2007, 2007 16th International Conference on Computer Communications and Networks.

[5]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[6]  Lei Wu,et al.  An evaluation of flow control in group communication , 1998, TNET.

[7]  Roberto Baldoni,et al.  Total Order Communications: A Practical Analysis , 2005, EDCC.

[8]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[9]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[10]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[11]  Christian Engelmann,et al.  Active/active replication for highly available HPC system services , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[12]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[13]  Christian Engelmann,et al.  JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management , 2006, 2006 IEEE International Conference on Cluster Computing.

[14]  H. Apte,et al.  Serverless Network File Systems , 2006 .

[15]  Danny Dolev,et al.  The Transis approach to high availability cluster communication , 1996, CACM.

[16]  David E. Bernholdt,et al.  MOLAR: adaptive runtime support for high-end computing operating and runtime systems , 2006, OPSR.

[17]  Christian Engelmann,et al.  Concepts for High Availability in Scientific High-End Computing , 2005 .

[18]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[19]  Kenneth P. Birman,et al.  Deceit: a flexible distributed file system , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.