Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior
暂无分享,去创建一个
Tao Lu | Ping Huang | Xubin He | Kun Tang | Devesh Tiwari | Sudharshan S. Vazhkudai | Ping Huang | Xubin He | Devesh Tiwari | Tao Lu | Kun Tang
[1] Hao Yang,et al. Support for Provisioning and Configuration Decisions for Data Intensive Workflows , 2016, IEEE Transactions on Parallel and Distributed Systems.
[2] John Shalf,et al. Using IOR to analyze the I/O Performance for HPC Platforms , 2007 .
[3] Ping Huang,et al. Power-Capping Aware Checkpointing: On the Interplay Among Power-Capping, Temperature, Reliability, Performance, and Energy , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[4] Antony I. T. Rowstron,et al. Write off-loading: Practical power management for enterprise storage , 2008, TOS.
[5] Rina Panigrahy,et al. Design Tradeoffs for SSD Performance , 2008, USENIX ATC.
[6] Irfan Ahmad,et al. PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.
[7] Saurabh Gupta,et al. Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[8] Wang Teng,et al. An Ephemeral Burst-Buffer File System for Scientific Applications , 2016 .
[9] Antony I. T. Rowstron,et al. Everest: Scaling Down Peak Loads Through I/O Off-Loading , 2008, OSDI.
[10] Bin Nie,et al. A large-scale study of soft-errors on GPUs in the field , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[11] Matei Ripeanu,et al. The case for a versatile storage system , 2010, OPSR.
[12] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[13] Saurabh Gupta,et al. Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Margaret H. Wright,et al. The opportunities and challenges of exascale computing , 2010 .
[15] Gregory R. Ganger,et al. The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .
[16] Gregory R. Ganger,et al. Argon: Performance Insulation for Shared Storage Servers , 2007, FAST.
[17] Robert Latham,et al. Understanding and improving computational science storage access through continuous characterization , 2011, MSST.
[18] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] Samuel Lang,et al. Server-side I/O coordination for parallel file systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[20] Robert Latham,et al. Leveraging burst buffer coordination to prevent I/O interference , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).
[21] Scott Klasky,et al. Characterizing output bottlenecks in a supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Yang Liu,et al. Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.
[23] Franck Cappello,et al. FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[24] Purushotham Bangalore,et al. Managing I/O Interference in a Shared Burst Buffer System , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[25] Nicholas J. Wright,et al. Architecture and Design of Cray DataWarp , 2016 .
[26] Saurabh Gupta,et al. Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Robert Latham,et al. Storage Access Characteristics of Computational Science Applications , 2010 .
[28] Peter M. Chen,et al. Striping in a RAID level 5 disk array , 1995, SIGMETRICS '95/PERFORMANCE '95.
[29] Robert B. Ross,et al. On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).
[30] Hao Yang,et al. Supporting storage configuration for I/O intensive workflows , 2014, ICS '14.
[31] Feiyi Wang,et al. OLCF ’ s 1 TB / s , Next-Generation Lustre File System , 2013 .
[32] Randy H. Katz,et al. An analytic performance model of disk arrays , 1993, SIGMETRICS '93.
[33] Galen M. Shipman,et al. The Spider Center Wide File System; From Concept to Reality , 2009 .
[34] Luigi Carro,et al. Understanding GPU errors on large-scale HPC systems and the implications for system design and operation , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[35] Don E Maxwell,et al. Monitoring Tools for Large Scale Systems , 2010 .
[36] B R de Supinski,et al. Detailed Modeling, Design, and Evaluation of a Scalable Multi-level Checkpointing System , 2010 .
[37] Karsten Schwan,et al. DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.
[38] Robert B. Ross,et al. CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[39] Anees Shaikh,et al. Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.
[40] Saurabh Gupta,et al. Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[41] Teng Wang,et al. TRIO: Burst Buffer Based I/O Orchestration , 2015, 2015 IEEE International Conference on Cluster Computing.
[42] Parosh Aziz Abdulla. Impact of Architecture and Technology for Extreme Scale on Software and Algorithm Design , 2010 .