Design and implementation of a low-overhead file checkpointing approach

One of checkpointing and recovery technique's most important capabilities is file checkpointing, i.e., to save and restore the state of user files of the process. The paper describes the design and implementation of a file checkpointing approach called Modification Operation Buffering. This approach buffers all the modification operations after a checkpoint until the next one, making all the operations between two checkpoints atomic as a whole. By choosing a suitable size dynamically for memory buffer, and by hiding the latency of flushing the buffer, this approach achieved an overhead lower than other approaches.

[1]  Michael Litzkow,et al.  Supporting checkpointing and process migration outside the UNIX kernel , 1999 .

[2]  Ju Jiu-Bin,et al.  SCR algorithm: saving/restoring states of file systems , 1999 .

[3]  Yi-Min Wang,et al.  Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  Deron Liang,et al.  Winckp: a transparent checkpointing and rollback recovery tool for Windows NT applications , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[5]  S. Yajnik,et al.  Checkpointing in CosMiC: a user-level process migration environment , 1997, Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems.

[6]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[7]  Dan Pei,et al.  Modification Operation Buffering : A Low-Overhead Approach to Checkpoint User Files , 1999 .