论文信息 - Execution replay on distributed memory architectures

Execution replay on distributed memory architectures

Debugging parallel programs on MIMD machines is a difficult task because successive executions of the same program can lead to different behaviors. To solve this problem, a method called execution replay has been introduced, which guarantees the reexecution of a program to be equivalent to the initial execution. Most of execution replay techniques proposed until now may be named 'data driven techniques'. Such techniques are relatively easy to implement in the case of the most common communication primitives. However, the time needed to record the large amount of required information is significant, which might modify the initial execution. Execution replay becomes in this case meaningless. Another class of execution replay named control driven execution replay allows one to limit the amount of recorded information. The paper presents a solution of the class control driven which realizes execution replay on distributed memory architectures. In contrary to all other proposed approaches, the technique is adapted to nonblocking primitives, and is not dependent on any form of message passing communication.<<ETX>>

André Schiper | Eric Leu | Abdel Wahab Zramdini | A. Schiper | Eric Leu

[1] Robert J. Fowler,et al. An integrated approach to parallel program debugging and performance analysis onlarge-scale multiprocessors , 1988, PADD '88.

[2] Richard J. LeBlanc,et al. Event-Driven Monitoring of Distributed Programs , 1985, ICDCS.

[3] Thomas J. LeBlanc,et al. Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[4] Geoffrey C. Fox,et al. Matrix algorithms on a hypercube I: Matrix multiplication , 1987, Parallel Comput..

[5] Mark A. Linton,et al. Supporting reverse execution for parallel programs , 1988, PADD '88.

[6] Wanlei Zhou. PM: a system for prototyping and monitoring remote procedure call programs , 1990, SOEN.

[7] Janice M. Stone. Debugging concurrent processes: a case study , 1988, PLDI '88.

[8] M. A. Bramer. Computer Game - Playing: Theory and Practice , 1983 .

[9] Larry D. Wittie,et al. BUGNET: A Debugging system for parallel programming environments , 1982, ICDCS.

[10] Stuart I. Feldman,et al. IGOR: a system for program debugging via reversible execution , 1988, PADD '88.