Pruners: Providing reproducibility for uncovering non-deterministic errors in runs on supercomputers

Large scientific simulations must be able to achieve the full-system potential of supercomputers. When they tap into high-performance features, however, a phenomenon known as non-determinism may be introduced in their program execution, which significantly hampers application development. Pruners is a new toolset to detect and remedy non-deterministic bugs and errors in large parallel applications. To show the capabilities of Pruners for large application development, we also demonstrate their early usage on real-world production applications.

[1]  Martin Schulz,et al.  ARCHER: Effectively Spotting Data Races in Large OpenMP Applications , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[2]  Martin Schulz,et al.  Clock delta compression for scalable order-replay of non-deterministic parallel applications , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Martin Schulz,et al.  Noise Injection Techniques to Expose Subtle and Unintended Message Races , 2017, PPOPP.

[4]  Ian Briggs,et al.  FLiT: Cross-platform floating-point result-consistency tester and workload , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[5]  Martin Schulz,et al.  Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications , 2014, 2014 LLVM Compiler Infrastructure in HPC.