Derivation of a Termination Detection Algorithm for Distributed Computations

The purpose of this paper is twofold, viz. to present a new [0] algorithm for the detection of the termination of a distributed computation and to demonstrate how the algorithm can be derived in a number of steps. We consider N machines, each of which is either active or passive. Only active machines send so-called ‘messages’ to other machines; message transmission is considered instantaneous. After having received a message, a machine is active; the receipt of a message is the only mechanism that triggers for a passive machine its transition to activity. For each machine, the transition from the active to the passive state may occur ‘spontaneously’. From the above it follows that the state in which all machines are passive is stable: the distributed computation with which the messages are associated is said to have terminated. The purpose of the algorithm to be designed is to enable one of the machines, machine nr.0 say, to detect that this stable state has been reached; it is furthermore required that the detection algorithm can cope with any distribution of the activity at the moment machine nr.0 initiates the detection algorithm. For brevity’s sake we shall denote the process by which termination is to be detected by the ‘probe’. The probe obviously has to involve, in some way or another, all the other machines. Two orderly configurations present themselves: an (N l)point star with machine nr.0 at its centre or the N machines arranged in a ring. Since the latter gives rise to less signalling traffic, we adopt the circular arrangement, more precisely, we assume the availability of communication facilities such that (i) machine nr.0 can initiate the probe by sending a signal to machine nr.N 1, (ii) machine nr.i + 1 can propagate the probe around the ring by sending a signal to machine nr.i These signalling facilities are assumed to be available, irrespective of the facilities for message sending. Note that being passive (with respect to the distributed computation proper) does not prevent a machine from partaking in the above signalling. The propagation of the probe around the ring allows us to describe that probe as sending a token around the ring. The token being returned to machine nr.0 will be an essential component of the justification of the conclusion that all machines are passive. As usual, the system state will be captured by an invariant, P say. In the sequel P will be constructed in a number of steps, each step consisting of an extension of the state space considered and an appropriate adjustment of P