Automated application-level checkpointing based on live-variable analysis in MPI programs

This paper proposes an optimization method of data saving for application-level checkpointing based on the live-variable analysis method for MPI programs. We presents the implementation of a source-to-source precompiler (CAC) for automating applicationlevel checkpointing based on the optimization method. The experiment shows that CAC is capable of automating application-level checkpointing correctly and reducing checkpoint data effectively.

[1]  E. N. Elnozahy,et al.  Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.

[2]  Xuejun Yang,et al.  The Fault Tolerant Parallel Algorithm: the Parallel Recomputing Based Failure Recovery , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[3]  Daniel Marques,et al.  Automated application-level checkpointing of MPI programs , 2003, PPoPP '03.