Execution Replay and Debugging of Distributed Multi-threaded Parallel Programs

Clusters of shared-memory symmetric multiprocessors are  increasingly used for high performance computing. To exploit in a convenient way both the inner parallelism of nodes and the parallelism between nodes, programming models for communicating threads are being developed. However, most of these models result in programs exhibiting non-deterministic behavior. This makes cyclic debugging of programs impossible, unless an efficient execution replay system can be provided. This article describes such an execution replay system for distributed thread programming combining synchronization primitives for threads sharing the same node, with communication primitives for threads of different nodes.  The execution replay system combines the most efficient trace size reduction technique for shared memory, based on the use of logical clocks, with a very efficient compression technique for trace data that originates from the test functions used in non-blocking communications.