Parallelizing heavyweight debugging tools with mpiecho

Idioms created for debugging execution on single processors and multicore systems have been successfully scaled to thousands of processors, but there is little hope that this class of techniques can continue to be scaled out to tens of millions of cores. In order to allow development of more scalable debugging idioms we introduce mpiecho, a novel runtime platform that enables cloning of MPI ranks. Given identical execution on each clone, we then show how heavyweight debugging approaches can be parallelized, reducing their overhead to a fraction of the serialized case. We also show how this platform can be useful in isolating the source of hardware-based nondeterministic behavior and provide a case study based on a recent processor bug at LLNL. While total overhead will depend on the individual tool, we show that the platform itself contributes little: 512x tool parallelization incurs at worst 2x overhead across the NAS Parallel benchmarks, hardware fault isolation contributes at worst an additional 44% overhead. Finally, we show how mpiecho can lead to near-linear reduction in overhead when combined with maid, a heavyweight memory tracking tool provided with Intel's pin platform. We demonstrate overhead reduction from 1466% to 53% and from 740% to 14% for cg (class D, 64 processes) and lu (class D, 64 processes), respectively, using only an additional 64 cores.

[1]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[2]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[3]  Jaspal Subhlok,et al.  VolpexMPI: An MPI Library for Execution of Parallel Applications on Volatile Nodes , 2009, PVM/MPI.

[4]  Wei Cai,et al.  Scalable Line Dynamics in ParaDiS , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[5]  Martin Schulz,et al.  PNMPI tools: a whole lot greater than the sum of their parts , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[6]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[7]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[8]  Peter E. Strazdins,et al.  Parallelisation of the Valgrind Dynamic Binary Instrumentation Framework , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[9]  Bronis R. de Supinski,et al.  Dynamic Software Testing of MPI Applications with Umpire , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[10]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[11]  Martin Schulz,et al.  Scalable load-balance measurement for SPMD codes , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Christian Engelmann,et al.  Redundant Execution of HPC Applications with MR-MPI , 2011 .

[13]  James H. Laros,et al.  rMPI : increasing fault resiliency in a message-passing environment. , 2011 .

[14]  Nicholas Nethercote,et al.  Dynamic Binary Analysis and Instrumentation , 2004 .

[15]  Timothy Wilson,et al.  As-If Infinitely Ranged Integer Model , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[16]  W. Collins,et al.  Description of the NCAR Community Atmosphere Model (CAM 3.0) , 2004 .