Correcting Errors in Message Passing Systems

We present an algorithm for correcting communication errors using delivered and undelivered messages. It is used to suggest corrective measures to remove errors introduced by typographical errors in message passing systems like PVM and MPI. The paper focuses on the validity of the algorithm by proving that for a nontrivial number of errors the algorithm can suggest changes to correct the errors. The algorithm has been implemented as a tool in Millipede (Multi Level Interactive Parallel Debugger), which is a support environment developed to assist programmers to debug message passing programs at different abstraction levels.

[1]  Cherri M. Pancake,et al.  What users need in parallel tool support: survey results and analysis , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[2]  J. Volkert,et al.  Using different levels of abstraction for parallel program debugging , 1997, Proceedings Intelligent Information Systems. IIS'97.

[3]  Oliver Pretzel Error-correcting codes and finite fields (student ed.) , 1996 .

[4]  Oliver Pretzel Error-Correcting Codes and Finite Fields , 1992 .

[5]  James Arthur Kohl,et al.  The PVM 3.4 tracing facility and XPVM 1.1 , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.