VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale
暂无分享,去创建一个
[1] Song Jiang,et al. Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[2] F. Moore,et al. Polynomial Codes Over Certain Finite Fields , 2017 .
[3] Dhabaleswar K. Panda,et al. Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[4] Franck Cappello,et al. FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[5] Franck Cappello,et al. Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O , 2012, 2012 IEEE International Conference on Cluster Computing.
[6] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Franck Cappello,et al. AI-Ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing , 2013, HPDC.
[8] Bogdan Nicolae,et al. Leveraging Naturally Distributed Data Redundancy to Reduce Collective I/O Replication Overhead , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[9] Lei Cao,et al. To share or not to share: comparing burst buffer architectures , 2017, SpringSim.
[10] Christopher J. Hughes,et al. Location-aware cache management for many-core processors with deep cache hierarchy , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] David Abrahams,et al. C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ In-Depth Series) , 2004 .
[12] Dhabaleswar K. Panda,et al. A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.
[13] Bogdan Nicolae,et al. Towards Scalable Checkpoint Restart: A Collective Inline Memory Contents Deduplication Proposal , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[14] Hal Finkel,et al. HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures , 2014, 1410.2805.
[15] Daniel Sánchez,et al. Jenga: Software-defined cache hierarchies , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[16] Bogdan Nicolae,et al. On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage , 2011, Trans. Large Scale Data Knowl. Centered Syst..
[17] Hal Finkel,et al. HACC , 2016, Commun. ACM.
[18] Frank Mueller,et al. Comparing different approaches for Incremental Checkpointing : The Showdown , 2011 .
[19] Parthasarathy Ranganathan,et al. Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies , 2012, CF '12.
[20] Robert B. Ross,et al. Optimizing I/O forwarding techniques for extreme-scale event tracing , 2014, Cluster Computing.
[21] Dhabaleswar K. Panda,et al. A 1 PB/s file system to checkpoint three million MPI tasks , 2013, HPDC.
[22] George Kurian,et al. LDAC , 2016, ACM Trans. Archit. Code Optim..