Detecting Arbitrary Stable Properties Using Efficient Snapshots

A stable properly continues to hold in an execution once it becomes true. Detecting arbitrary stable properties efficiently in distributed executions is still an open problem. The known algorithms for detecting arbitrary stable properties and snapshot algorithms used to detect such stable properties suffer from drawbacks such as the following: They incur the overhead of a large number of messages per global snapshot, or alter application message headers, or use inhibition, or use the execution history, or assume a strong property such as causal delivery of messages in the system. We solve the problem of detecting an arbitrary stable property efficiently under the following assumptions: P1) the application messages should not be modified, not even by timestamps or message coloring. P2) no inhibition is allowed. P3) the algorithm should not use the message history. P4) any process can initiate the algorithm. This paper proposes a family of nonintrusive algorithms requiring 6(n - 1) control messages, where n is the number of processes. A three-phase strategy of uncoordinated observation of local states is used to give a consistent snapshot from which any stable property can be detected. A key feature of our algorithms is that they do not rely on the processes continually and pessimistically reporting their activity. Only the relevant activity that occurs in the thin slice during the algorithm execution needs to be examined.

[1]  Ajay D. Kshemkalyani,et al.  Nonintrusive Snapshots Using Thin Slices , 2005, EUC.

[2]  Michel Raynal,et al.  Distributed algorithms and protocols , 1988 .

[3]  C. V. Ramamoorthy,et al.  Protocols for Deadlock Detection in Distributed Database Systems , 1982, IEEE Transactions on Software Engineering.

[4]  Correct two-phase and one-phase deadlock detection algorithms for distributed systems , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[5]  Tong-Ying Tony Juang,et al.  Efficient algorithms for optimistic crash recovery , 1994, Distributed Computing.

[6]  B. R. Badrinath,et al.  Recording Distributed Snapshots Based on Causal Order of Message Delivery , 1992, Inf. Process. Lett..

[7]  Jean-Michel Hélary Observing Global States of Asynchronous Distributed Applications , 1989, WDAG.

[8]  Nigamanth Sridhar,et al.  Lazy Snapshots , 2002 .

[9]  André Schiper,et al.  Strong stable properties in distributed systems , 1994, Distributed Computing.

[10]  Ten-Hwang Lai,et al.  On Distributed Snapshots , 1987, Inf. Process. Lett..

[11]  Kim Taylor,et al.  The inhibition spectrum and the achievement of causal consistency , 1990, PODC '90.

[12]  Colin J. Fidge,et al.  Logical time in distributed computing systems , 1991, Computer.

[13]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[14]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[15]  Keith Marzullo,et al.  Efficient detection of a class of stable properties , 1994, Distributed Computing.

[16]  Michel Raynal,et al.  Detection of stable properties in distributed applications , 1987, PODC '87.

[17]  Friedemann Mattern,et al.  Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..

[18]  Hon Fung Li,et al.  Global State Detection in Non-FIFO Networks , 1987, ICDCS.

[19]  Mark Weiser,et al.  Programmers use slices when debugging , 1982, CACM.

[20]  S. Venkatesan,et al.  Message-optimal incremental snapshots , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[21]  Madalene Spezialetti,et al.  Efficient Distributed Snapshots , 1986, ICDCS.

[22]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[23]  Vijay K. Garg,et al.  Detecting Locally Stable Predicates Without Modifying Application Messages , 2003, OPODIS.

[24]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[25]  Michel Raynal,et al.  Towards the construction of distributed detection programs, with an application to distributed termination , 1994, Distributed Computing.

[26]  Subbarayan Venkatesan,et al.  An Optimal Algorithm for Distributed Snapshots with Causal Message Ordering , 1994, Inf. Process. Lett..

[27]  Ajay D Kshemkalyanit,et al.  I An intrqduction to snapshot I algorithms in distributed computing , 1995 .