The Importance of Run-Time Error Detection

The ability of system software to detect and issue error messages that help programmers quickly fix serial and parallel run-time errors is an important productivity criterion for developing and maintaining application programs. Over ten thousand run-time error tests and a run-time error detection (RTED) evaluation tool has been developed for the automatic evaluation of run-time error detection capabilities for serial errors and for parallel errors in MPI, OpenMP and UPC programs. Evaluation results, tests and the RTED evaluation tool are freely available at http://rted.public.iastate.edu. Many compilers, tools and run-time systems scored poorly on these tests. The authors make recommendations for providing better RTED in the future.

[1]  Martin Schulz,et al.  A graph based approach for MPI deadlock detection , 2009, ICS '09.

[2]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[3]  Bronis R. de Supinski,et al.  Dynamic Software Testing of MPI Applications with Umpire , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[4]  Barbara Chapman,et al.  Using OpenMP - portable shared memory parallel programming , 2007, Scientific and engineering computation.

[5]  Hua Chen,et al.  MPI‐CHECK: a tool for checking Fortran 90 MPI programs , 2003, Concurr. Comput. Pract. Exp..

[6]  Katherine Yelick,et al.  UPC: Distributed Shared-Memory Programming , 2003 .

[7]  Ying Li,et al.  A survey of systems for detecting serial run‐time errors , 2006, Concurr. Comput. Pract. Exp..