Towards Scalable Checkpoint Restart: A Collective Inline Memory Contents Deduplication Proposal
暂无分享,去创建一个
[1] Robert B. Ross,et al. PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.
[2] Margaret H. Wright,et al. The opportunities and challenges of exascale computing , 2010 .
[3] Franck Cappello,et al. BlobCR: Efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[4] Gabriel Antoniu,et al. BlobSeer: Next-generation data management for large scale infrastructures , 2011, J. Parallel Distributed Comput..
[5] Xiaofang Zhao,et al. Performance analysis and optimization of MPI collective operations on multi-core clusters , 2009, The Journal of Supercomputing.
[6] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[7] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Kai Li,et al. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.
[9] Yuan Xie,et al. Hybrid checkpointing using emerging nonvolatile memories for future exascale systems , 2011, TACO.
[10] Franck Cappello,et al. Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O , 2012, 2012 IEEE International Conference on Cluster Computing.
[11] Sameer Kumar,et al. Collective algorithms for sub-communicators , 2012, ICS '12.
[12] Franck Cappello,et al. Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds , 2012, Euro-Par.
[13] Michal Kaczmarczyk,et al. HYDRAstor: A Scalable Secondary Storage , 2009, FAST.
[14] Song Jiang,et al. Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[15] John T. Daly,et al. Application monitoring and checkpointing in HPC: looking towards exascale systems , 2012, ACM-SE '12.
[16] Franck Cappello,et al. FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[17] Lorenzo Alvisi,et al. Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[18] André Brinkmann,et al. A study on data deduplication in HPC storage systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] Frank Mueller,et al. Comparing different approaches for Incremental Checkpointing : The Showdown , 2011 .
[20] George H. Bryan,et al. The Maximum Intensity of Tropical Cyclones in Axisymmetric Numerical Model Simulations , 2009 .
[21] Jason Evans April. A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .
[22] Kurt B. Ferreira,et al. On the Viability of Checkpoint Compression for Extreme Scale Fault Tolerance , 2011, Euro-Par Workshops.
[23] Franck Cappello,et al. Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[24] David Brink,et al. A (probably) exact solution to the Birthday Problem , 2012 .
[25] Rolf Riesen,et al. libhashckpt: Hash-Based Incremental Checkpointing Using GPU's , 2011, EuroMPI.