Compiler Technology for Portable Checkpoints

We have implemented a prototype compiler called porch that transforms C programs into C programs supporting portable checkpoints. Portable checkpoints capture the state of a computation in a machine-independent format that allows the transfer of computations across binary incompatible machines. We introduce sourceto-source compilation techniques for generating code to save and recover from such portable checkpoints automatically. These techniques instrument a program with code that maps the state of a computation into a machine-independent representation and vice versa. In particular, the following problems are addressed: (1) providing stack environment portability, (2) enabling conversion of complex data types, and (3) rendering pointers portable. Experimental results show that the overhead of checkpointing is reasonably small, even if data representation conversion is required for portability.

[1]  B. Ramkumar,et al.  Portable checkpointing for heterogeneous architectures , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[2]  Bjarne Steensgaard,et al.  Object and native code thread mobility among heterogeneous computers , 1995, SOSP.

[3]  E. Ippoliti Parallel Algorithms for Short-range Molecular Dynamics , 2011 .

[4]  Mohamed F. Younis,et al.  Architecture and language support for fault-tolerance in complex real-time systems , 1996, Proceedings of ICECCS '96: 2nd IEEE International Conference on Engineering of Complex Computer Systems (held jointly with 6th CSESAW and 4th IEEE RTAW).

[5]  Marvin Theimer,et al.  Heterogeneous process migration by recompilation , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[6]  Volker Strumpen,et al.  Portable Checkpointing and Recovery in Heterogeneous Environments , 1996 .

[7]  W. Kent Fuchs,et al.  Compiler‐assisted full checkpointing , 1994, Softw. Pract. Exp..

[8]  Volker Strumpen,et al.  Portable Checkpointing for Heterogenous Architectures , 1997, International Symposium on Fault-Tolerant Computing.

[9]  George C. Necula,et al.  Proof-Carrying Code , 2011, Encyclopedia of Cryptography and Security.

[10]  Jonathan M. Smith,et al.  A survey of process migration mechanisms , 1988, OPSR.

[11]  Norman C. Hutchinson,et al.  The possibilities and limitations of heterogeneous process migration , 1998 .

[12]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[13]  Kai Li,et al.  Heterogeneous Distributed Shared Memory , 1992, IEEE Trans. Parallel Distributed Syst..

[14]  Christine Hofmeister Dynamic reconfiguration of distributed applications , 1993 .

[15]  A. Beguelin,et al.  High-Level Fault Tolerance in Distributed , 1994 .

[16]  Robert C. Miller,et al.  A type-checking preprocessor for Cilk 2, a multithreaded C language , 1995 .

[17]  P. Tamayo,et al.  Parallel Algorithms for Short-range Molecular Dynamics , 1995 .