A Distributed Algorithm of Fault Recovery for Stateful Failover

In [8], a high availability framework based on Harary graph as network topology has been proposed for stateful failover. Framework proposed therein exhibits an interesting property that an uniform load can be given to each non-faulty node while maintaining fault tolerance. A challenging problem in this context, which has not been addressed in [8] is to be able to come up with a distributed algorithm of automated fault recovery which can exploit the properties exhibited by the framework. In this work, we propose a distributed algorithm with low message and round complexity for automated fault recovery in case of stateful failover. We then prove the correctness of the algorithm using techniques from formal verification. The safety, liveness and the timeliness properties of the algorithm have been verified by the model checker SPIN.

[1]  Bhabani P. Sinha,et al.  Hamiltonian Graphs with Minimum Number of Edges for Fault-Tolerant Topologies , 1992, Inf. Process. Lett..

[2]  Che-Liang Yang,et al.  A Distributed Algorithm for Fault Diagnosis in Systems with Soft Failures , 1988, IEEE Trans. Computers.

[3]  Lih-Hsing Hsu,et al.  Optimal k-Fault-Tolerant Networks for Token Rings , 2000, J. Inf. Sci. Eng..

[4]  Lih-Hsing Hsu,et al.  On the construction of combined k-fault-tolerant Hamiltonian graphs , 2001, Networks.

[5]  Gerard J. Holzmann,et al.  The SPIN Model Checker , 2003 .

[6]  F. Harary THE MAXIMUM CONNECTIVITY OF A GRAPH. , 1962, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Sudhakar M. Reddy,et al.  Distributed fault-tolerance for large multiprocessor systems , 1980, ISCA '80.

[8]  Cauligi S. Raghavendra,et al.  Fault-Tolerant Networks Based on the de Bruijn Graph , 1991, IEEE Trans. Computers.

[9]  Indranil Saha,et al.  Designing Reliable Architecture for Stateful Fault Tolerance , 2006, 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06).

[10]  Gerard J. Holzmann,et al.  The SPIN Model Checker - primer and reference manual , 2003 .