Detection of first races for debugging message-passing programs

Message races, which can cause nondeterministic executions of a message-passing program, should be detected for debugging. Especially it is more important to detect the first race that occurs for the first time in a process than to detect affected races that might be side effects of nondeterminism. The previous techniques are not efficient to detect those races because they require more than two runs of a program. This paper presents an efficient technique that requires only one execution to detect the first race in each process. For this, we use a new information, called message history, that consists of send/receive events related to the first race. Also we introduce an algorithm to detect the first races using message history. In the experiment, we show that our technique exactly detects the first race during an execution using several MPI programs.

[1]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[2]  Barton P. Miller,et al.  Optimal tracing and replay for debugging message-passing parallel programs , 1992, Proceedings Supercomputing '92.

[3]  Barton P. Miller,et al.  Optimal tracing and replay for debugging message-passing parallel programs , 1992, Supercomputing '92.

[4]  Yu Lei,et al.  Efficient reachability testing of asynchronous message-passing programs , 2002, Eighth IEEE International Conference on Engineering of Complex Computer Systems, 2002. Proceedings..

[5]  Yong-Kee Jun,et al.  MPIRace-Check: Detection of Message Races in MPI Programs , 2007, GPC.

[6]  Yong-Kee Jun,et al.  Detecting Unaffected Race Conditions in Message-Passing Programs , 2004, PVM/MPI.

[7]  Robert H. B. Netzer,et al.  Debugging race conditions in message-passing programs , 1996, SPDT '96.

[8]  Dieter Kranzlmüller,et al.  A Brief Overview of the MAD Debugging Activities , 2000, AADEBUG.

[9]  William Gropp,et al.  User's Guide for mpich, a Portable Implementation of MPI Version 1.2.2 , 1996 .

[10]  Maria Beatriz Carmo,et al.  Monitoring and debugging message passing applications with MPVisualizer , 2000, Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing.

[11]  Colin J. Fidge,et al.  Partial orders for parallel debugging , 1988, PADD '88.

[12]  Carl Kesselman,et al.  Generalized communicators in the Message Passing Interface , 1996, Proceedings. Second MPI Developer's Conference.

[13]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[14]  Michael M. Resch,et al.  MPI Application Development Using the Analysis Tool MARMOT , 2004, International Conference on Computational Science.

[15]  Martin Schulz,et al.  Notes on Nondeterminism in Message Passing Programs , 2002, PVM/MPI.

[16]  Joan M. Francioni,et al.  Testing races in parallel programs with an OtOt strategy , 1994, ISSTA '94.

[17]  Vijay K. Garg,et al.  Debugging distributed programs using controlled re-execution , 2000, PODC '00.

[18]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[19]  Michael M. Resch,et al.  MARMOT: An MPI Analysis and Checking Tool , 2003, PARCO.