Guaranteed Mutually Consistent Checkpointing in Distributed Computations

In this paper, we explore the isomorphism between vector time and causality to characterize consistency of a set of checkpoints in a distributed computing. A necessary and sufficient condition, to determine if a set of local checkpoints can form a consistent global checkpoint, is presented and proved using the isomorphic power of vector time and causality. To the best of our knowledge, this is the first attempt to use the isomorphism for this purpose. This condition leads to a simple and straightforward algorithm for a guaranteed mutually consistent global checkpointing. In our approach, a process can take a checkpoint whenever and wherever it wants while other related process may be asked to take an additional checkpoint for ensuring the mutual consistency. We also show how this condition and the resulting algorithm can be used to obtain a maximum and minimum global checkpoints, another important paradigm for distributed applications.