Improving an MPI Application-Level Migration Approach through Checkpoint File Splitting
暂无分享,去创建一个
[1] Miroslaw Malek,et al. A survey of online failure prediction methods , 2010, CSUR.
[2] Dhabaleswar K. Panda,et al. RDMA-Based Job Migration Framework for MPI over InfiniBand , 2010, 2010 IEEE International Conference on Cluster Computing.
[3] Dhabaleswar K. Panda,et al. High Performance Pipelined Process Migration with RDMA , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[4] Roberto R. Osorio,et al. Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes , 2013, New Generation Computing.
[5] Gabriel Rodríguez,et al. CPPC: a compiler‐assisted tool for portable checkpointing of message‐passing applications , 2010, Concurr. Comput. Pract. Exp..
[6] Gabriel Rodríguez,et al. Failure Avoidance in MPI Applications Using an Application-Level Approach , 2014, Comput. J..
[7] Laxmikant V. Kalé,et al. Proactive Fault Tolerance in MPI Applications Via Task Migration , 2006, HiPC.
[8] Gabriel Rodríguez,et al. Analysis of Performance-impacting Factors on Checkpointing Frameworks: The CPPC Case Study , 2011, Comput. J..
[9] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[10] Gabriel Rodríguez,et al. In-memory application-level checkpoint-based migration for MPI programs , 2014, The Journal of Supercomputing.
[11] Cong Du,et al. MPI-Mitten: Enabling Migration Technology in MPI , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[12] Chao Wang,et al. Proactive process-level live migration in HPC environments , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Fei Meng,et al. Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Barry V. Hess,et al. Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , 2010, HiPC 2010.
[15] Rajendra Singh,et al. Performance Driven Partial Checkpoint/Migrate for LAM-MPI , 2008, 2008 22nd International Symposium on High Performance Computing Systems and Applications.
[16] Chao Wang,et al. Proactive process-level live migration and back migration in HPC environments , 2012, J. Parallel Distributed Comput..