Detecting Unaffected Message Races in Parallel Programs

Detecting unaffected race conditions is important to debugging message-passing programs effectively, because such a message race can affect other races to occur or not. Unfortunately, the previous technique to efficiently detect unaffected races does not guarantee that all of the detected races are unaffected. This paper presents a novel technique that manages the states of the detected races by examining if every received message is affected until the execution terminates. Our technique guarantees to efficiently detect unaffected races, because it maintains affects-relations of the races all along the execution of program.

[1]  Robert Cypher,et al.  Efficient race detection for message-passing programs with nonblocking sends and receives , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[2]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[3]  Robert H. B. Netzer,et al.  Debugging race conditions in message-passing programs , 1996, SPDT '96.

[4]  Colin J. Fidge,et al.  Partial orders for parallel debugging , 1988, PADD '88.

[5]  Michael M. Resch,et al.  MPI Application Development Using the Analysis Tool MARMOT , 2004, International Conference on Computational Science.

[6]  Yu Lei,et al.  Efficient reachability testing of asynchronous message-passing programs , 2002, Eighth IEEE International Conference on Engineering of Complex Computer Systems, 2002. Proceedings..

[7]  Joan M. Francioni,et al.  Nondeterminancy: testing and debugging in message passing parallel programs , 1993, PADD '93.

[8]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[9]  Charles E. McDowell,et al.  Scalable Monitoring Technique for Detecting Races in Parallel Programs , 2000, IPDPS Workshops.

[10]  Hongli Zhang,et al.  Ad Hoc Debugging Environment for Grid Applications , 2004, GCC.

[11]  William Gropp,et al.  User's Guide for mpich, a Portable Implementation of MPI Version 1.2.2 , 1996 .

[12]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[13]  K. C. Tai Race analysis of traces of asynchronous message-passing programs , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[14]  Jun Yong-Kee,et al.  Scalable Race Visualization for Debugging Message-Passing Programs , 2005 .

[15]  Joan M. Francioni,et al.  Testing races in parallel programs with an OtOt strategy , 1994, ISSTA '94.

[16]  Richard B. Kilgore,et al.  Re-execution of Distributed Programs to Detect Bugs Hidden by Racing , 1997, HICSS.

[17]  Barton P. Miller,et al.  Optimal tracing and replay for debugging message-passing parallel programs , 1992, Proceedings Supercomputing '92.

[18]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[19]  C. Chase,et al.  Re-execution of distributed programs to detect bugs hidden by racing messages , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[20]  William Gropp,et al.  Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.

[21]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[22]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[23]  Yong-Kee Jun,et al.  Detecting Unaffected Race Conditions in Message-Passing Programs , 2004, PVM/MPI.

[24]  E. Lusk,et al.  Installation guide to mpich, a portable implementation of MPI , 1996 .

[25]  Ana Paula Cláudio,et al.  A Race Detection Mechanism Embedded in a Conceptual Model for the Debugging of Message-Passing Distributed Programs , 2003, Euro-Par.