A Survey of Distributed Database Checkpointing

Checkpointing a database is a vital technique to reduce the recovery time in the presence of a failure. For distributed databases, checkpointing also provides an efficient way to perform global reconstruction. In this paper, we survey and classify previous approaches for checkpointing a distributed database. Since the need for global reconstruction is infrequent in most distributed databases, a less restrictive and less resource-consuming approach to checkpoint distributed databases in an integrated distributed database system is recommended over a transaction consistent checkpoint approach. For a federated or multidatabase system, any type of global consistent checkpoint is difficult to achieve without violating local autonomy.

[1]  Henry F. Korth,et al.  The Double Life of the Transaction Abstraction: Fundamental Principle and Evolving System Concept , 1995, VLDB.

[2]  Sang H. Son,et al.  Experimental Evaluation of a Concurrent Checkpointing Algorithm , 1990 .

[3]  J. A. McDermid Checkpointing and Error Recovery in distributed Systems , 1981, ICDCS.

[4]  Gilles Zurfluh Failure Survivability Mechanisms in Plexus Project , 1981, DDSS.

[5]  Dimitrios Georgakopoulos Transaction management in multidatabase systems , 1991 .

[6]  Peter Dadam,et al.  Reconstruction of Consistent Global States in Distributed Databases , 1980, DDB.

[7]  Sang Hyuk Son,et al.  Distributed Checkpointing for Globally Consistent States of Databases , 1989, IEEE Transactions on Software Engineering.

[8]  Joost Verhofstad,et al.  Recovery Techniques for Database Systems , 1978, CSUR.

[9]  Michael Stonebraker,et al.  A Formal Model of Crash Recovery in a Distributed System , 1983, IEEE Transactions on Software Engineering.

[10]  Junguk L. Kim,et al.  An efficient recovery scheme for locking-based distributed database systems , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[11]  Hans-Jörg Schek,et al.  Semantics-based multilevel transaction management in federated systems , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[12]  Nancy A. Lynch,et al.  Global States of a Distributed System , 1982, IEEE Transactions on Software Engineering.

[13]  Calton Pu,et al.  Superdatabases for composition of heterogeneous databases , 1988, Proceedings. Fourth International Conference on Data Engineering.

[14]  Jari Veijalainen,et al.  2PC Agent method: achieving serializability in presence of failures in a heterogeneous multidatabase , 1990, Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications.

[15]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[16]  Slawomir Pilarski,et al.  Checkpointing for Distributed Databases: Starting from the Basics , 1992, IEEE Trans. Parallel Distributed Syst..

[17]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[18]  Hector Garcia-Molina,et al.  Node Autonomy In Distributed Systems , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[19]  Abraham Silberschatz,et al.  Reliable transaction management in a multidatabase system , 1990, SIGMOD '90.

[20]  Hector Garcia-Molina,et al.  Management of a remote backup copy for disaster recovery , 1991, TODS.

[21]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[22]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[23]  Abraham Silberschatz,et al.  Failure-resilient transaction management in multidatabase , 1991, Computer.

[24]  Gerhard Weikum,et al.  Implementation and performance of multi-level transaction management in a multidatabase environment , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[25]  Radu Popescu-Zeletin,et al.  Transaction management in distributed heterogeneous database management systems , 1986, Inf. Syst..

[26]  Peter Dadam,et al.  Recovery in Distributed Databases Based on Non-Synchronized Local Checkpoints , 1980, IFIP Congress.

[27]  S.H. Son,et al.  Efficient decentralized checkpointing in distributed database systems , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[28]  Jim Lyon Design considerations in replicated database systems for disaster protection , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[29]  Calton Pu,et al.  Performance Evaluation of Global Reading of Entire Databases , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[30]  Ge-Ming Chiu,et al.  A crash recovery technique in distributed computing systems , 1994, 14th International Conference on Distributed Computing Systems.

[31]  Slawomir Pilarski,et al.  A novel checkpointing scheme for distributed database systems , 1990, PODS '90.

[32]  J. Eliot B. Moss,et al.  Checkpoint and Restart in Distributed Transaction Systems , 1983, Symposium on Reliability in Distributed Software and Database Systems.

[33]  J. T. Lim,et al.  A checkpointing scheme for heterogeneous distributed database systems , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[34]  Amit P. Sheth,et al.  Using Tickets to Enforce the Serializability of Multidatabase Transactions , 1994, IEEE Trans. Knowl. Data Eng..

[35]  Walter H. Kohler,et al.  A Survey of Techniques for Synchronization and Recovery in Decentralized Computer Systems , 1981, CSUR.

[36]  Virgil D. Gligor,et al.  Interconnecting Heterogeneous Database Management Systems , 1984, Computer.

[37]  Sang Hyuk Son,et al.  An Algorithm for Database Reconstruction in Distributed Environments , 1986, ICDCS.

[38]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[39]  Kwang-Moo Choe,et al.  Techniques for database recovery in distributed environments , 1988 .

[40]  Sang Hyuk Son,et al.  Practicality of Non-Interfering Checkpoints in Distributed Database Systems , 1986, IEEE Real-Time Systems Symposium.

[41]  Calton Pu On-the-fly, incremental, consistent reading of entire databases , 2005, Algorithmica.

[42]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[43]  Sang Hyuk Son An Adaptive Checkpointing Scheme for Distributed Databases with Mixed Types of Transactions , 1989, IEEE Trans. Knowl. Data Eng..

[44]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[45]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[46]  David A. Bell,et al.  Distributed database systems , 1992 .

[47]  Herbert Kuss On totally ordering checkpoints in distributed data bases , 1982, SIGMOD '82.

[48]  Abraham Silberschatz,et al.  Transaction management issues in a failure-prone multidatabase system environment , 2005, The VLDB Journal.

[49]  Hans-Jörg Schek,et al.  A multi-level transaction approach to federated DBMS transaction management , 1991, [1991] Proceedings. First International Workshop on Interoperability in Multidatabase Systems.

[50]  Sang Hyuk Son An algorithm for non-interfering checkpoints and its practicality in distributed database systems , 1989, Inf. Syst..

[51]  Guy Ferran Distributed Checkpointing in a Distributed Data Management System , 1981, RTSS.