Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems
暂无分享,去创建一个
[1] Henri Casanova,et al. Checkpointing strategies for parallel jobs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[2] Bianca Schroeder,et al. The Computer Failure Data Repository (CFDR): collecting, sharing and analyzing failure data , 2006, SC.
[3] Feng Chen,et al. Hystor: making the best use of solid state drives in high performance storage systems , 2011, ICS '11.
[4] Tei-Wei Kuo,et al. A file-system-aware FTL design for flash-memory storage systems , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[5] Robert B. Ross,et al. On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).
[6] Qing Yang,et al. I-CASH: Intelligently Coupled Array of SSD and HDD , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[7] Xiaoming Zhang,et al. Hybrid hierarchy storage system in MilkyWay-2 supercomputer , 2014, Frontiers of Computer Science.
[8] Bianca Schroeder,et al. Checkpoint/restart in practice: When ‘simple is better’ , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[9] Sungjin Lee,et al. Lifetime management of flash-based SSDs using recovery-aware dynamic throttling , 2012, FAST.
[10] Jason Duell,et al. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .
[11] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[12] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[13] Bianca Schroeder,et al. To checkpoint or not to checkpoint: Understanding energy-performance-I/O tradeoffs in HPC checkpointing , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[14] Devesh Tiwari,et al. A practical approach to reconciling availability, performance, and capacity in provisioning extreme-scale storage systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Teng Wang,et al. BurstMem: A high-performance burst buffer system for scientific applications , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[16] Rina Panigrahy,et al. Design Tradeoffs for SSD Performance , 2008, USENIX ATC.
[17] Mahesh Balakrishnan,et al. Extending SSD Lifetimes with Disk-Based Write Caches , 2010, FAST.
[18] Fabio Margaglia,et al. Extending SSD lifetime in database applications with page overwrites , 2013, SYSTOR '13.
[19] Sorin Faibish,et al. Jitter-free co-processing on a prototype exascale storage stack , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).
[20] Stephen L. Scott,et al. A reliability-aware approach for an optimal checkpoint/restart model in HPC environments , 2007, 2007 IEEE International Conference on Cluster Computing.
[21] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Saurabh Gupta,et al. Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[23] Tian Luo,et al. CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives , 2011, FAST.
[24] Youyou Lu,et al. Extending the lifetime of flash-based storage through reducing write amplification from file systems , 2013, FAST.
[25] Satoshi Matsuoka,et al. A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[26] Lorenz T. Biegler,et al. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..
[27] Evangelos Eleftheriou,et al. Write amplification analysis in flash-based solid state drives , 2009, SYSTOR '09.
[28] Andrew A. Chien,et al. How Much SSD Is Useful for Resilience in Supercomputers , 2015, FTXS@HPDC.
[29] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.
[30] Kai Shen,et al. A performance evaluation of scientific I/O workloads on Flash-based SSDs , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[31] Saurabh Gupta,et al. Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[32] Xubin He,et al. Delta-FTL: improving SSD lifetime via exploiting content locality , 2012, EuroSys '12.
[33] Nitin H. Vaidya,et al. Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme , 1997, IEEE Trans. Computers.
[34] Satoshi Matsuoka,et al. Design and modeling of a non-blocking checkpointing system , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[35] Lipeng Wan,et al. A Report on Simulation-Driven Reliability and Failure Analysis of Large-Scale Storage Systems , 2014 .
[36] Lipeng Wan,et al. SSD-optimized workload placement with adaptive learning and classification in HPC environments , 2014, 2014 30th Symposium on Mass Storage Systems and Technologies (MSST).