How Much SSD Is Useful for Resilience in Supercomputers
暂无分享,去创建一个
[1] John Daly. A Model for Predicting the Optimum Checkpoint Interval for Restart Dumps , 2003, International Conference on Computational Science.
[2] Bu-Sung Lee,et al. Cost Minimization for Provisioning Virtual Servers in Amazon Elastic Compute Cloud , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.
[3] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[4] Mohammad A. Khaleel. Scientific Grand Challenges: Crosscutting Technologies for Computing at the Exascale - February 2-4, 2010, Washington, D.C. , 2011 .
[5] Xiaola Lin,et al. A Variational Calculus Approach to Optimal Checkpoint Placement , 2001, IEEE Trans. Computers.
[6] Victor F. Nicola,et al. Checkpointing and the modeling of program execution time , 1994 .
[7] Dhabaleswar K. Panda,et al. Enhancing Checkpoint Performance with Staging IO and SSD , 2010, 2010 International Workshop on Storage Network Architecture and Parallel I/Os.
[8] Ravishankar K. Iyer,et al. Lessons Learned from the Analysis of System Failures at Petascale: The Case of Blue Waters , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[9] Derong Shen,et al. A Throughput Driven Task Scheduler for Improving MapReduce Performance in Job-Intensive Environments , 2013, 2013 IEEE International Congress on Big Data.
[10] Franck Cappello,et al. Toward Exascale Resilience: 2014 update , 2014, Supercomput. Front. Innov..
[11] Paul H. Siegel,et al. Characterizing flash memory: Anomalies, observations, and applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Franck Cappello,et al. Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..
[14] John Bent,et al. Storage challenges at Los Alamos National Lab , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).
[15] Robert B. Ross,et al. On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).
[16] Franck Cappello,et al. Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..
[17] Satoshi Matsuoka,et al. A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[18] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[19] Cho-Li Wang,et al. Error-Tolerant Resource Allocation and Payment Minimization for Cloud System , 2013, IEEE Transactions on Parallel and Distributed Systems.
[20] Franck Cappello,et al. FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[21] K. Mani Chandy,et al. Analytic models for rollback and recovery strategies in data base systems , 1975, IEEE Transactions on Software Engineering.
[22] Erol Gelenbe,et al. A model of roll-back recovery with multiple checkpoints , 1976, ICSE '76.
[23] Takeo Kanade,et al. High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation , 2014, Lecture Notes in Computer Science.
[24] Nitin H. Vaidya. A Case of Multi-Level Distributed Recovery Schemes , 2001 .
[25] S. Leyffer,et al. Software for Nonlinearly Constrained Optimization , 2011 .
[26] Qing Zhang,et al. Job Scheduling Optimization for Multi-user MapReduce Clusters , 2011, 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming.
[27] Thomas Hérault,et al. Optimal Checkpointing Period: Time vs. Energy , 2013, PMBS@SC.
[28] Yinglin Wang,et al. A round robin with multiple feedback job scheduler in Hadoop , 2014, 2014 IEEE International Conference on Progress in Informatics and Computing.
[29] Andrew A. Chien,et al. Moore's Law: The First Ending and a New Beginning , 2013, Computer.
[30] Edward G. Coffman,et al. Scheduling Checks and Saves , 1992, INFORMS J. Comput..
[31] Michael Lang,et al. The design and implementation of a multi-level content-addressable checkpoint file system , 2012, 2012 19th International Conference on High Performance Computing.
[32] Franck Cappello,et al. Toward Exascale Resilience , 2009, Int. J. High Perform. Comput. Appl..
[33] B R de Supinski,et al. Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System , 2010 .
[34] Laxmikant V. Kalé,et al. Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.