MPIRace-Check: Detection of Message Races in MPI Programs

Message races, which can cause nondeterministic executions of a parallel program, should be detected for debugging because nondeterminism makes debugging parallel programs a difficult task. Even though there are some tools to detect message races in MPI programs, they do not provide practical information to locate and debug message races in MPI programs. In this paper, we present an on-the-fly detection tool, which is MPIRace-Check, for debugging MPI programs written in C language. MPIRace-Check detects and reports all race conditions in all processes by checking the concurrency of the communication between processes. Also it reports the message races with some practical information such as the line number of a source code, the processes number, and the channel information which are involved in the races. By providing those information, it lets programmers distinguish of unintended races among the reported races, and lets the programmers know directly where the races occur in a huge source code. In the experiment we will show that MPIRace-Check detects the races using some testing programs as well as the tool is efficient.

[1]  Vijay K. Garg,et al.  Debugging distributed programs using controlled re-execution , 2000, PODC '00.

[2]  Maria Beatriz Carmo,et al.  MPVisualizer: A General Tool to Debug Message Passing Parallel Applications , 1999, HPCN Europe.

[3]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[4]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[5]  Nigel P. Topham,et al.  Performance of the decoupled ACRI-1 architecture: the perfect club , 1995, HPCN Europe.

[6]  Maria Beatriz Carmo,et al.  Monitoring and debugging message passing applications with MPVisualizer , 2000, Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing.

[7]  Michael M. Resch,et al.  MARMOT: An MPI Analysis and Checking Tool , 2003, PARCO.

[8]  Robert Cypher,et al.  The semantics of blocking and nonblocking send and receive primitives , 1994, Proceedings of 8th International Parallel Processing Symposium.

[9]  Yong-Kee Jun,et al.  Detecting Unaffected Race Conditions in Message-Passing Programs , 2004, PVM/MPI.

[10]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[11]  Michael M. Resch,et al.  MPI Application Development Using the Analysis Tool MARMOT , 2004, International Conference on Computational Science.

[12]  Martin Schulz,et al.  Notes on Nondeterminism in Message Passing Programs , 2002, PVM/MPI.

[13]  Colin J. Fidge,et al.  Partial orders for parallel debugging , 1988, PADD '88.

[14]  Robert H. B. Netzer,et al.  Debugging race conditions in message-passing programs , 1996, SPDT '96.

[15]  Yu Lei,et al.  Efficient reachability testing of asynchronous message-passing programs , 2002, Eighth IEEE International Conference on Engineering of Complex Computer Systems, 2002. Proceedings..

[16]  Stephen Gilmore,et al.  Evaluating the Performance of Skeleton-Based High Level Parallel Programs , 2004, International Conference on Computational Science.

[17]  Dieter Kranzlmüller,et al.  A Brief Overview of the MAD Debugging Activities , 2000, AADEBUG.