The trend towards many-core multi-processor systems and clusters will make systems with tens and hundreds of processors more widely available. Current manual debugging techniques do not scale well to such large systems. Advanced automated debugging tools are needed for standard programming models based on commodity computing, such as threads and MPI. We surveyed MPI users to identify the kinds of MPI errors that they encounter, and classify the errors into several types. We describe how automated tools can detect such errors and present the Intel® Message Checker (IMC) technology being developed at the Intel Advanced Computing Center. IMC's unique technology automatically detects several kinds of MPI errors such as various types of mismatches, race conditions, deadlocks and potential deadlocks, and resource misuse. Finally, we review the usability and uniqueness of IMC and discuss our future plans.
[1]
Michael M. Resch,et al.
MPI I/O Analysis and Error Detection with MARMOT
,
2004,
PVM/MPI.
[2]
Chris McDonald,et al.
A preliminary topological debugger for MPI programs
,
2001,
Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.
[3]
Hua Chen,et al.
MPI‐CHECK: a tool for checking Fortran 90 MPI programs
,
2003,
Concurr. Comput. Pract. Exp..
[4]
Bronis R. de Supinski,et al.
Dynamic Software Testing of MPI Applications with Umpire
,
2000,
ACM/IEEE SC 2000 Conference (SC'00).