Compiler-Assisted Checkpointing

In this paper we present compiler-assisted checkpointing, a new technique which uses static program analysis to optimize the performance of checkpointing. We achieve this performance gain using libckpt, a checkpointing library which implements memory exclusion in the context of user-directed checkpointing. The correctness of user-directed checkpointing is dependent on program analysis and insertion of memory exclusion calls by the programmer. With compiler-assisted checkpointing, this analysis is automated by a compiler or preprocessor. The resulting memory exclusion calls will optimize the performance of checkpointing, and are guaranteed to be correct. We provide a full description of our program analysis techniques and present detailed examples of analyzing three fortran programs. The results of these analyses have been implemented in libckpt, and we present the performance improvements that they yield.

[1]  Alexander V. Veidenbaum,et al.  Detecting redundant accesses to array data , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[2]  Keith D. Cooper,et al.  An experiment with inline substitution , 1991, Softw. Pract. Exp..

[3]  M. Moura Silva,et al.  Checkpointing SPMD applications on transputer networks , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[4]  Kai Li,et al.  Faster checkpointing with N+1 parity , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[5]  Stuart I. Feldman,et al.  IGOR: a system for program debugging via reversible execution , 1988, PADD '88.

[6]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[7]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[8]  W. Kent Fuchs,et al.  CATCH-compiler-assisted techniques for checkpointing , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[9]  Jeffrey F. Naughton,et al.  Low-Latency, Concurrent Checkpointing for Parallel Programs , 1994, IEEE Trans. Parallel Distributed Syst..

[10]  Jeffrey F. Naughton,et al.  Real-time, concurrent checkpoint for parallel programs , 1990, PPOPP '90.

[11]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[12]  Robert H. B. Netzer,et al.  Optimal tracing and incremental reexecution for debugging long-running programs , 1994, PLDI '94.

[13]  Keshav Pingali,et al.  Dependence-based program analysis , 1993, PLDI '93.

[14]  Jong-Deok Choi,et al.  Automatic construction of sparse data flow evaluation graphs , 1991, POPL '91.

[15]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[16]  Kai Li,et al.  ickp: a consistent checkpointer for multicomputers , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[17]  Willy Zwaenepoel,et al.  On the use and implementation of message logging , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[18]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[19]  Peter Steenkiste,et al.  Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery , 1993 .

[20]  Thomas R. Gross,et al.  Structured dataflow analysis for arrays and its use in an optimizing compiler , 1990, Softw. Pract. Exp..

[21]  L. Lehmann,et al.  Rollback Recovery in Multiprocessor Ring Configurations , 1987, Fehlertolerierende Rechensysteme.

[22]  Rajiv Gupta,et al.  A practical data flow framework for array reference analysis and its use in optimizations , 1993, PLDI '93.

[23]  David Callahan,et al.  The program summary graph and flow-sensitive interprocedual data flow analysis , 1988, PLDI '88.