THE DESIGN AND IMPLEMENTATION OF A DEBUGGER FOR PARALLEL PROGRAMS BASED ON CHECKPOINT

In order to support the debugging of large scale parallel programs that run for a long time, it is necessary to introduce checkpointing techniques into debuggers for parallel programs. In the application of checkpoint, there are four problems,such as transient messages, nephew messages,domino effect, and live lock. Also, non deterministic must be solved in debuggers for parallel programs. The deterministic checkpointing technique based on state freezing could avoid three of the four problems arisen in checkpoint. The problem of transient messages is solved by using the technique of message recording. The non deterministic problem of parallel debugging is solved by record/replay. The technique proposed can effectively solve all the problems when combining parallel debugging and checkpointing. The primary benefit of the technique is simple, clear and easy to implement. In order to apply the technique, a debugger, called DENNET is implemented, which can debug parallel programs in rollback mode.