AI-Ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing
暂无分享,去创建一个
[1] Rolf Riesen,et al. libhashckpt: Hash-Based Incremental Checkpointing Using GPU's , 2011, EuroMPI.
[2] Song Jiang,et al. Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[3] Andrew Warfield,et al. SecondSite: disaster tolerance as a service , 2012, VEE '12.
[4] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[5] John T. Daly,et al. Application monitoring and checkpointing in HPC: looking towards exascale systems , 2012, ACM-SE '12.
[6] Franck Cappello,et al. A hybrid local storage transfer scheme for live migration of I/O intensive workloads , 2012, HPDC '12.
[7] Franck Cappello,et al. FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[8] Torsten Hoefler,et al. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Franck Cappello,et al. Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds , 2012, Euro-Par.
[10] Chao Wang,et al. Hybrid Checkpointing for MPI Jobs in HPC Environments , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.
[11] Franck Cappello,et al. BlobCR: Efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[12] Brendan Tangney,et al. Scrabble-a distributed application with an emphasis on continuity , 1990, Softw. Eng. J..
[13] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Peter J. Denning,et al. Working Sets Past and Present , 1980, IEEE Transactions on Software Engineering.
[15] D. Manivannan,et al. A quasi-synchronous checkpointing algorithm that prevents contention for stable storage , 2008, Inf. Sci..
[16] Yuan Xie,et al. Hybrid checkpointing using emerging nonvolatile memories for future exascale systems , 2011, TACO.
[17] Franck Cappello,et al. Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O , 2012, 2012 IEEE International Conference on Cluster Computing.
[18] Andrew Warfield,et al. Live migration of virtual machines , 2005, NSDI.
[19] George H. Bryan,et al. The Maximum Intensity of Tropical Cyclones in Axisymmetric Numerical Model Simulations , 2009 .
[20] Jason Evans April. A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .
[21] Khaled Z. Ibrahim,et al. Optimized pre-copy live migration for memory intensive applications , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[22] Bogdan Nicolae,et al. Towards Scalable Checkpoint Restart: A Collective Inline Memory Contents Deduplication Proposal , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[23] Bogdan Nicolae,et al. On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage , 2011, Trans. Large Scale Data Knowl. Centered Syst..
[24] Rolf Riesen,et al. Transparent Redundant Computing with MPI , 2010, EuroMPI.
[25] Frank Mueller,et al. Comparing different approaches for Incremental Checkpointing : The Showdown , 2011 .
[26] Robert B. Ross,et al. PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.