Efficient Execution Replay for ATHAPASCAN-0 Parallel Programs

ATHAPASCAN-0 programs are executed by a network of communicating threads evolving dynamically. Within the same node, threads communicate through shared memory and synchronization primitives. Between two different nodes, threads communicate by message passing. Execution replay of ATHAPASCAN-0 programs addresses the non-determinism arising from synchronization races, from promiscuous messages received from non specified source and from the varying number of operations testing the completion of non blocking ATHAPASCAN-0 primitives. The execution replay mechanism is mainly control-base- d since, in addition to recording the results of test operations, only the order of accesses to synchronization functions and the order of arrival of promiscuous messages need to be recorded. The efficiency of the recording comes from the use of Lamport clocks to reduce drastically the number of records associated to synchronization operations and from the reduction to a single record of the information necessary to reproduce a series of unsuccessful tests.