NOPE: A Nondeterministic Program Evaluator

Nondeterminism in parallel programs can lead to different results in successive executions even if the same input is supplied. In order to allow debugging of such programs, some kind of replay technique is required. During an initial record phase a program's execution is monitored and information about occurring events is stored in trace files. During subsequent replay steps the traces are used to reproduce an equivalent execution. The problem is that a trace describes one particular execution and therefore limits the user's analysis abilities to this case. Other execution paths can only be analyzed if corresponding program runs can be monitored. This problem is addressed by the nondeterministic program evaluator NOPE, which extends traditional replay to automatically generate other possible execution paths. The idea is to perform combinatorial event manipulation of racing messages on an initial trace to enforce different event orders during replay. If each permutation is tested, different execution paths with previously unknown results and hidden errors may be revealed.

[1]  David F. Snelling,et al.  A comparative study of libraries for parallel processing , 1988, Parallel Comput..

[2]  Manuel Blum,et al.  Program result-checking: a theory of testing meets a test of theory , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[3]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[4]  Margaret L. Simmons,et al.  DeBugging and Performance Tuning for Parallel Computing Systems , 1996 .

[5]  D Kranzlmüller,et al.  Debugging with the MAD Environment , 1997, Parallel Comput..

[6]  Allen D. Malony,et al.  Models for performance perturbation analysis , 1991, PADD '91.

[7]  Dieter Kranzlmüller,et al.  Event graph visualization for debugging large applications , 1996, SPDT '96.

[8]  Jason Gait,et al.  A probe effect in concurrent programs , 1986, Softw. Pract. Exp..

[9]  André Schiper,et al.  Execution replay on distributed memory architectures , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[10]  Sartaj Sahni,et al.  Parallel Matrix and Graph Algorithms , 1981, SIAM J. Comput..

[11]  Joan M. Francioni,et al.  Testing races in parallel programs with an OtOt strategy , 1994, ISSTA '94.

[12]  Barton P. Miller,et al.  Optimal tracing and replay for debugging message-passing parallel programs , 1992, Proceedings Supercomputing '92.

[13]  Dieter Kranzlmüller,et al.  Rolt/sup MP/-replay of Lamport timestamps for message passing systems , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[14]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.