Distributed Reset

A reset subsystem is designed that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Each of these components is self-stabilizing in the following sense: if the coordination between the up-processes in the system is ever lost (due to failures or repairs of processes and channels), then each component eventually reaches a state where coordination is regained. This capability makes our reset subsystem very robust: it can tolerate fail-stop failures and repairs of processes and channels, even when a reset is in progress. >

[1]  Anish Arora A foundation of fault-tolerant computing , 1992 .

[2]  Anish Arora,et al.  Distributed Reset (Extended Abstract) , 1990, FSTTCS.

[3]  Reuven Bar-Yehuda,et al.  Fault Tolerant Distributed Majority Commitment , 1988, J. Algorithms.

[4]  Baruch Awerbuch,et al.  Applying static network protocols to dynamic networks , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[5]  Shmuel Katz,et al.  Self-stabilizing extensions for message-passing systems , 1990, PODC '90.

[6]  Amos Israeli,et al.  Self-stabilization of dynamic systems assuming only read/write atomicity , 1990, PODC '90.

[7]  Gadi Taubenfeld Leader Election in the Presence of n-1 Initial Failures , 1989, Inf. Process. Lett..

[8]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[9]  Mohamed G. Gouda,et al.  Token Systems that Self-Stabilize , 1989, IEEE Trans. Computers.

[10]  Amos Israeli,et al.  Self-Stabilization of Dynamic Systems Assuming only Read/Write Atomicity , 1990, PODC.

[11]  William D. Tajibnapis,et al.  A correctness proof of a topology information maintenance protocol for a distributed computer network , 1977, CACM.

[12]  Edsger W. Dijkstra,et al.  Termination Detection for Diffusing Computations , 1980, Inf. Process. Lett..

[13]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[14]  Mohamed G. Gouda,et al.  Stabilizing Communication Protocols , 1991, IEEE Trans. Computers.

[15]  Nancy A. Lynch,et al.  Distributed Computing: Models and Methods , 1990, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[16]  Amos Israeli,et al.  Self Stabilization of Dynamic Systems , 1989, The Sixteenth Conference of Electrical and Electronics Engineers in Israel,.

[17]  Radia Perlman,et al.  An algorithm for distributed computation of a spanningtree in an extended LAN , 1985, SIGCOMM '85.

[18]  Jan K. Pachl,et al.  Uniform self-stabilizing rings , 1988, TOPL.

[19]  Radia J. Perlman,et al.  An algorithm for distributed computation of a spanningtree in an extended LAN , 1985, SIGCOMM '85.