Semantics driven dynamic partial-order reduction of MPI-based parallel programs

Most distributed parallel programs in the high performance computing (HPC) arena are written using the MPI library. There is growing interest in using model checking for debugging these MPI programs. In this context, partial-order reduction has considerable potential for containing state explosion, given the distributed memory nature of MPI programs. This potential is largely unmet. In this paper, we first define the formal semantics for a non-trivial subset of MPI. We then prove independence theorems based on theformal semantics, paving the way to a semantically clear and general partial-order reduction approach for MPI. Our work describes, for the first time, the exact dependencies between MPI non-blocking send operations and their tests for completion, namely wait and test. We also offer a cleaner solution than in previous works for MPI wildcard receives,a proper handling of which requires knowledge of the future course of computations. We show that Flanagan and Godefroid's dynamic patial-order reduction algorithm offers a natural way to handle the need for future information. Our initial experimental results are encouraging.

[1]  Patrice Godefroid,et al.  Partial-Order Methods for the Verification of Concurrent Systems , 1996, Lecture Notes in Computer Science.

[2]  GEORGE S. AVRUNIN ANALYSIS OF MPI PROGRAMS , 2003 .

[3]  George S. Avrunin,et al.  Modeling wildcard-free MPI programs for verification , 2005, PPOPP.

[4]  Yu Yang,et al.  Gauss: A Framework for Verifying Scientific Computing Software , 2006, Electron. Notes Theor. Comput. Sci..

[5]  Antti Valmari,et al.  A stubborn attack on state explosion , 1990, Formal Methods Syst. Des..

[6]  Forum Mpi MPI: A Message-Passing Interface , 1994 .

[7]  Stephan Merz,et al.  Model Checking , 2000 .

[8]  Ganesh Gopalakrishnan,et al.  Verification of MPI programs using spin , 2004 .

[9]  Stephen F. Siegel Model Checking Nonblocking MPI Programs , 2007, VMCAI.

[10]  Matthew B. Dwyer,et al.  Bogor: an extensible and highly-modular software model checking framework , 2003, ESEC/FSE-11.

[11]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[12]  Rajeev Thakur,et al.  Formal Verification of Programs That Use MPI One-Sided Communication , 2006, PVM/MPI.

[13]  Patrice Godefroid,et al.  Dynamic partial-order reduction for model checking software , 2005, POPL '05.

[14]  George S. Avrunin,et al.  Verification of MPI-Based Software for Scientific Computation , 2004, SPIN.

[15]  Dragan Bosnacki,et al.  Cluster-Based Partial-Order Reduction , 2004 .

[16]  Stephen F. Siegel Efficient Verification of Halting Properties for MPI Programs with Wildcard Receives , 2005, VMCAI.

[17]  Victor Samofalov,et al.  Automated, scalable debugging of MPI programs with Intel® Message Checker , 2005, SE-HPCS '05.

[18]  Patrice Godefroid,et al.  Model checking for programming languages using VeriSoft , 1997, POPL '97.

[19]  Jakob Rehof,et al.  Zing: A Model Checker for Concurrent Software , 2004, CAV.

[20]  M. Robby,et al.  Bogor : An Extensible and Highly Modular Model Checking Framework , 2003 .

[21]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[22]  R. Kirby,et al.  The Communication Semantics of the Message Passing Interface ∗ , 2006 .

[23]  Alain J. Martin,et al.  Slack Elasticity in Concurrent Computing , 1998, MPC.

[24]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .