Parastack: efficient hang detection for MPI programs at large scale
暂无分享,去创建一个
[1] Torsten Hoefler,et al. Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .
[2] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[3] Qi Gao,et al. FlowChecker: Detecting Bugs in MPI Libraries via Message Flow Checking , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Martin Schulz,et al. Accurate application progress analysis for large-scale parallel debugging , 2014, PLDI.
[5] C. Eisenhart,et al. Tables for Testing Randomness of Grouping in a Sequence of Alternatives , 1943 .
[6] Bowen Zhou,et al. Vrisha: using scaling properties of parallel programs for bug detection and localization , 2011, HPDC '11.
[7] Werner Krotz-Vogel,et al. Automated MPI Correctness Checking What if there was a magic option ? , 2007 .
[8] Michael M. Resch,et al. MARMOT: An MPI Analysis and Checking Tool , 2003, PARCO.
[9] Martin Schulz,et al. Large scale debugging of parallel tasks with AutomaDeD , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[10] Dhabaleswar K. Panda,et al. DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[11] Bronis R. de Supinski,et al. Dynamic Software Testing of MPI Applications with Umpire , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[12] Martin Schulz,et al. Scalable temporal order analysis for large scale debugging , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[13] Martin Schulz,et al. Stack Trace Analysis for Large Scale Debugging , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[14] Feng Qin,et al. SyncChecker: Detecting Synchronization Errors between MPI Applications and Libraries , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[15] Ganesh Gopalakrishnan,et al. ISP: a tool for model checking MPI programs , 2008, PPOPP.
[16] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[17] Sandia Report,et al. Toward a New Metric for Ranking High Performance Computing Systems , 2013 .
[18] James Coyle,et al. Deadlock detection in MPI programs , 2002, Concurr. Comput. Pract. Exp..
[19] Victor Samofalov,et al. Automated, scalable debugging of MPI programs with Intel® Message Checker , 2005, SE-HPCS '05.
[20] Christel Baier,et al. Distributed wait state tracking for runtime MPI deadlock detection , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[21] Martin Schulz,et al. AutomaDeD: Automata-based debugging for dissimilar parallel tasks , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[22] Franck Cappello,et al. Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..
[23] Martin Schulz,et al. Debugging high-performance computing applications at massive scales , 2015, Commun. ACM.
[24] Bronis R. de Supinski,et al. Diagnosis of Performance Faults in LargeScale MPI Applications via Probabilistic Progress-Dependence Inference , 2015, IEEE Transactions on Parallel and Distributed Systems.
[25] Bowen Zhou,et al. WuKong: automatically detecting and localizing bugs that manifest at large system scales , 2013, HPDC '13.
[26] Jun Wei,et al. MC-Checker: Detecting Memory Consistency Errors in MPI One-Sided Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Barton P. Miller,et al. Problem Diagnosis in Large-Scale Computing Environments , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[28] Haibo Chen,et al. Why software hangs and what can be done with it , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[29] Bronis R. de Supinski,et al. Probabilistic diagnosis of performance faults in large-scale parallel applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[30] W. Haque. Concurrent Deadlock Detection In Parallel Programs , 2006 .
[31] Martin Schulz,et al. A graph based approach for MPI deadlock detection , 2009, ICS '09.