CprFS: a user-level file system to support consistent file states for checkpoint and restart

Checkpoint and Restart (CPR) is becoming critical to large scale parallel computers, whose Mean Time Between Failures (MTBF) may be much shorter than the execution times of the applications. The CPR mechanism should be able to store and recover the states of virtual memory, communication and files for the applications in a consistent way. However, many CPR tools ignore file states, which may cause errors for applications with file operations on recovery. Some CPR tools adopt library-based approaches or kernel-level file systems to deal with file states, but they only support limited types of file operations which are not sufficient for some applications. Moreover, many library-based approaches are not transparent to user applications because they wrap file APIs. Kernel-level file systems are difficult to deploy in production systems due to unnecessary overhead they may introduce to applications that do not need CPR. In this paper we propose a user-level file system, CprFS, to address these problems. As a file system, CprFS can guarantee transparency to user applications, and is convenient to support arbitrary file operations. It can be deployed on applications' demand to avoid intervention with other applications. Experimental results on micro-benchmarks and real-world applications show that CprFS introduces acceptable overhead and has little impact on checkpointing systems.

[1]  Satoshi Hoshina,et al.  Fault recovery mechanism for multiprocessor servers , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[2]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[3]  Jason Nieh,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation , 2022 .

[4]  Dhabaleswar K. Panda,et al.  Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[5]  Miron Livny,et al.  Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .

[6]  Rob VanderWijngaart,et al.  NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[7]  Hua Zhong,et al.  CRAK: Linux Checkpoint/Restart As a Kernel Module , 1996 .

[8]  Volker Strumpen,et al.  Fault-Tolerant File-I/O for Portable Checkpointing Systems , 2000, The Journal of Supercomputing.

[9]  Dan Pei,et al.  Modification Operation Buffering : A Low-Overhead Approach to Checkpoint User Files , 1999 .

[10]  Kuo-Bin Li,et al.  ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..

[11]  Heon Young Yeom,et al.  Design and Implementation of Multiple Fault-Tolerant MPI over Myrinet (M^3) , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[12]  J. Duell The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .

[13]  Piyush Maheshwari,et al.  Supporting Cost-Effective Fault Tolerance in Distributed Message-Passing Applications with File Operations , 1999, The Journal of Supercomputing.

[14]  Yi-Min Wang,et al.  Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[15]  Jose Renato Santos,et al.  Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Operating Systems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[16]  Heon Young Yeom,et al.  A user-transparent recoverable file system for distributed computing environment , 2005, CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005..

[17]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[18]  Josep Torrellas,et al.  ReViveI/O: efficient handling of I/O in highly-available rollback-recovery servers , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[19]  B. Bouteiller,et al.  MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[20]  Wenguang Chen,et al.  Thckpt: Transparent Checkpointing of Linux Processes Under IA-64 , 2005, PDPTA.

[21]  Jiwu Shu,et al.  Parallel algorithm and implementation for realtime dynamic simulation of power system , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[22]  Srinidhi Varadarajan,et al.  DejaVu: transparent user-level checkpointing, migration and recovery for distributed systems , 2006, SC.

[23]  S. Yajnik,et al.  Checkpointing in CosMiC: a user-level process migration environment , 1997, Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems.

[24]  Ashwin Raju Jeyakumar Metamori: A library for Incremental File Checkpointing , 2004 .

[25]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[26]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[27]  Bo Hong,et al.  File System Workload Analysis For Large Scientific Computing Applications , 2004, MSST.

[28]  Jason Duell,et al.  The design and implementation of Berkeley Lab's linuxcheckpoint/restart , 2005 .