Consistent Checkpointing in Distributed Databases: Towards a Formal Approach

Whether it is for audit or for recovery purposes, data checkpointing is an important problem of distributed database systems. Actually, transactions establish dependence relations on data checkpoints taken by data object managers. So, given an arbitrary set of data checkpoints (including at least a single data checkpoint from a data manager, and at most a data checkpoint from each data manager), an important question is the following one: ``Can these data checkpoints be members of a same consistent global checkpoint?''. This paper answers this question by providing a necessary and sufficient condition suited for database systems. Moreover, to show the usefulness of this condition, two {\em non-intrusive} data checkpointing protocols are derived from this condition. It is also interesting to note that this paper, by exhibiting ``correspondences'', establishes a bridge between the data object/transaction model and the process/message-passing model.

[1]  Augusto Ciuffoletti,et al.  A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.

[2]  Michel Raynal,et al.  Consistent records in asynchronous computations , 1998, Acta Informatica.

[3]  W. Kent Fuchs,et al.  Lazy checkpoint coordination for bounding rollback propagation , 1992, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[4]  Sang Hyuk Son,et al.  Distributed Checkpointing for Globally Consistent States of Databases , 1989, IEEE Transactions on Software Engineering.

[5]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[6]  Michel Raynal,et al.  On Granularity of Events in Distributed Computations , 1994, Specification of Parallel Algorithms.

[7]  Jian Xu,et al.  Necessary and Sufficient Conditions for Consistent Global Snapshots , 1995, IEEE Trans. Parallel Distributed Syst..

[8]  Abraham Silberschatz,et al.  On Rigorous Transaction Scheduling , 1991, IEEE Trans. Software Eng..

[9]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[10]  Slawomir Pilarski,et al.  Checkpointing for Distributed Databases: Starting from the Basics , 1992, IEEE Trans. Parallel Distributed Syst..

[11]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[12]  Hector Garcia-Molina,et al.  Checkpointing memory-resident databases , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[13]  Brian Randell System structure for software fault tolerance , 1975 .

[14]  WangYi-Min Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints , 1997 .

[15]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.