A New Computational Model of Optimized Checkpoint Interval

Many applications (sequential or parallel) require large amount of time to complete. Such applications can encounter loss of a significant amount of computation if a failure occurs during the execution. Checkpointing and rollback is a technique used to minimize the loss of computation in an environment subject to failures. Unfortunately because of the employment of checkpoint scheme, an additional checkpoint overhead can be introduced to the system. Too big or too small checkpoint interval maybe degrades the performance of system. Proper determination of checkpoint interval can make system performance optimized. The difficulty is how to determine the checkpoint interval, at which condition the performance of checkpoint scheme is optimal. The optimized checkpoint interval's computational equation that was presented in Vaidya's model is independent of the time of checkpoint latency and checkpoint recovery that the application program spends when it rollbacks after an error occurs, which is his great contribution. This paper introduces a new segment based model, presents mean availability that is easier to be understood in fault tolerant instead of checkpoint mean overhead in Vaidya's model and derives a new equation that is also independent of the time of checkpoint latency and recovery. In the end, we give a group of computation results based on the experiment. In addition we analyze the relation of this two model. The conclusion is that the model of NSBM is more effective than the model of Vaidya in respect of the computation of checkpoint interval.