Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications
暂无分享,去创建一个
George Bosilca | Aurelien Bouteiller | Nuria Losada | Aurélien Bouteiller | G. Bosilca | Nuria Losada
[1] Lorenzo Alvisi,et al. Message logging: pessimistic, optimistic, and causal , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.
[2] Yuhua Tang,et al. A Message Logging Protocol Based on User Level Failure Mitigation , 2013, ICA3PP.
[3] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[4] Kevin Harms,et al. Characterization of MPI Usage on a Production Supercomputer , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.
[6] Ravishankar K. Iyer,et al. Measuring the Resiliency of Extreme-Scale Computing Environments , 2016 .
[7] R. Hornung,et al. HYDRODYNAMICS CHALLENGE PROBLEM , 2011 .
[8] Gabriel Rodríguez,et al. CPPC: a compiler-assisted tool for portable checkpointing of message-passing applications , 2010 .
[9] Thomas Hérault,et al. Post-failure recovery of MPI communication capability , 2013, Int. J. High Perform. Comput. Appl..
[10] George Bosilca,et al. Local rollback for resilient MPI applications with application-level checkpointing and message logging , 2019, Future Gener. Comput. Syst..
[11] Dolores Rexachs,et al. Hybrid Message Pessimistic Logging. Improving current pessimistic message logging protocols , 2017, J. Parallel Distributed Comput..
[12] Laxmikant V. Kalé,et al. Camel: collective-aware message logging , 2015, The Journal of Supercomputing.
[13] George Bosilca,et al. Redesigning the message logging model for high performance , 2010, Concurr. Comput. Pract. Exp..
[14] Haim Avron,et al. Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization , 2014, IPDPS.
[15] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[16] Franck Cappello,et al. SPBC: Leveraging the characteristics of MPI HPC applications for scalable checkpointing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[17] George Bosilca,et al. Using software-based performance counters to expose low-level open MPI performance information , 2017, EuroMPI/USA.
[18] Harrick M. Vin,et al. The Cost of Recovery in Message Logging Protocols , 2000, IEEE Trans. Knowl. Data Eng..
[19] Laxmikant V. Kalé,et al. Team-Based Message Logging: Preliminary Results , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[20] Nicholas J. Higham,et al. Performance analysis of asynchronous Jacobi’s method implemented in MPI, SHMEM and OpenMP , 2014, Int. J. High Perform. Comput. Appl..
[21] E. Wolters,et al. MOCFE-Bone: the 3D MOC mini-application for exascale research , 2013 .
[22] Thomas Hérault,et al. Correlated Set Coordination in Fault Tolerant Message Logging Protocols , 2011, Euro-Par.
[23] Ian Karlin,et al. LULESH 2.0 Updates and Changes , 2013 .
[24] Jack Dongarra,et al. Performance of asynchronous optimized Schwarz with one-sided communication , 2019, Parallel Comput..
[25] Timothy G. Mattson,et al. The Parallel Research Kernels , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[26] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.