Fault-Tolerant File-I/O for Portable Checkpointing Systems

The ftIO-system provides portable and fault-tolerant file-I/O by enhancing the functionality of the ANSI C file system without changing its application programmer interface and without depending on system-specific implementations of the standard file operations. The ftIO-system is an extension of the porch compiler and its runtime system. The porch compiler automatically generates code to save bookkeeping information about ftIO's transactional file operations in portable checkpoints. These portable checkpoints can be recovered on a binary incompatible architecture. We developed a new algorithm for supporting transactional file operations in ftIO. Rather than using the well-known two-phase commit protocol, this algorithm uses only a single bit of information and an atomic rename file operation to guarantee fault tolerance. In this paper, we describe our new ftIO algorithm, discuss design choices for ftIO, and provide experimental data of our ftIO prototype.

[1]  David A. Patterson,et al.  Storage performance-metrics and benchmarks , 1993 .

[2]  J. T. Poole Preliminary survey of i/o intensive applications , 1994 .

[3]  Brian N. Bershad,et al.  Software write detection for a distributed shared memory , 1994, OSDI '94.

[4]  Nate Kushman,et al.  Performance Nonmonotonicities: A Case Study of the UltraSPARC Processor , 1998 .

[5]  Igor B. Lyubashevskiy,et al.  Portable fault-tolerant file I/O , 1998 .

[6]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[7]  Christine Hofmeister Dynamic reconfiguration of distributed applications , 1993 .

[8]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[9]  Herbert Schildt,et al.  The annotated ANSI C Standard American National Standard for Programming Languages—C: ANSI/ISO 9899-1990 , 1990 .

[10]  Norman C. Hutchinson,et al.  The possibilities and limitations of heterogeneous process migration , 1998 .

[11]  P. J. Plauger The Standard C Library , 1991 .

[12]  Yi-Min Wang,et al.  Integrating checkpointing with transaction processing , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[13]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[14]  Miron Livny,et al.  Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .

[15]  B. Ramkumar,et al.  Portable checkpointing for heterogeneous architectures , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[16]  David A. Patterson,et al.  A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance , 1994, TOCS.

[17]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.