Efficient Algorithms for Global Snapshots in Large Distributed Systems

Existing algorithms for global snapshots in distributed systems are not scalable when the underlying topology is complete. There are primarily two classes of existing algorithms for computing a global snapshot. Algorithms in the first class use control messages of size 0(1) but require O(N) space and O(N) messages per processor in a network with JV processors. Algorithms in the second class use control messages (such as rotating tokens with vector counter method) of size O(N), use multiple control messages per channel, or require recording of message history. As a result, algorithms in both of these classes are not efficient in large systems when the logical topology of the communication layer such as MPI is complete. In this paper, we propose three scalable algorithms for global snapshots: a grid-based, a tree-based, and a centralized algorithm. The grid-based algorithm uses O(N) space but only O(¿(N)) messages per processor each of size O(¿(N)). The tree-based and centralized algorithms use only O(1) size messages. The tree-based algorithm requires O(1) space and O(log N log(W/N)) messages per processor where W is the total number of messages in transit. The centralized algorithm requires O(1) space and O(log(W/N)) messages per processor. We also have a matching lower bound for this problem. We also present hybrid of centralized and tree-based algorithms that allow trade-off between the decentralization and the message complexity. Our algorithms have applications in checkpointing, detecting stable predicates, and implementing synchronizers.

[1]  Vijay K. Garg,et al.  Scalable algorithms for global snapshots in distributed systems , 2006, ICS '06.

[2]  Vijay K. Garg,et al.  Concurrent and distributed computing in Java , 2004 .

[3]  K. Mani Chandy,et al.  How processes learn , 1985, PODC '85.

[4]  D. Manivannan,et al.  An optimistic checkpointing and message logging approach for consistent global checkpoint collection in distributed systems , 2008, J. Parallel Distributed Comput..

[5]  Madalene Spezialetti,et al.  Efficient Distributed Snapshots , 1986, ICDCS.

[6]  Baruch Awerbuch,et al.  Complexity of network synchronization , 1985, JACM.

[7]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[8]  Daniel Marques,et al.  Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[9]  Vijay K. Garg Concurrent and Distributed Computing in Java: Garg/Concurrent Computing in Java , 2005 .

[10]  Achour Mostéfaoui,et al.  Communication-based prevention of useless checkpoints in distributed computations , 2000, Distributed Computing.

[11]  Ajay D. Kshemkalyani,et al.  An introduction to snapshot algorithms in distributed computing , 1995, Distributed Syst. Eng..

[12]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[13]  Edsger W. Dijkstra,et al.  Termination Detection for Diffusing Computations , 1980, Inf. Process. Lett..

[14]  Ajay D. Kshemkalyani,et al.  Detecting Arbitrary Stable Properties Using Efficient Snapshots , 2007, IEEE Transactions on Software Engineering.

[15]  Ten-Hwang Lai,et al.  On Distributed Snapshots , 1987, Inf. Process. Lett..

[16]  Luc Bougé,et al.  Repeated Snapshots in Distributed Systems with Synchronous Communications and their Implementation in CSP , 1987, Theor. Comput. Sci..

[17]  Friedemann Mattern,et al.  Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..