Revisiting I/O behavior in large-scale storage systems: the expected and the unexpected
暂无分享,去创建一个
Suren Byna | Tirthak Patel | Devesh Tiwari | Glenn K. Lockwood | S. Byna | Tirthak Patel | Devesh Tiwari
[1] Scott Klasky,et al. Characterizing output bottlenecks in a supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Jeffrey S. Vetter,et al. TensorFlow Doing HPC , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[3] Robert Latham,et al. Analysis and Correlation of Application I/O Performance and System-Wide I/O Activity , 2017, 2017 International Conference on Networking, Architecture, and Storage (NAS).
[4] Feiyi Wang,et al. OLCF ’ s 1 TB / s , Next-Generation Lustre File System , 2013 .
[5] Song Huang,et al. Reliability Characterization of Solid State Drives in a Scalable Production Datacenter , 2018, 2018 IEEE International Conference on Big Data (Big Data).
[6] Ross Miller,et al. Comparative I/O workload characterization of two leadership class storage clusters , 2015, PDSW '15.
[7] Tao Lu,et al. Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior , 2017, 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).
[8] Jiesheng Wu,et al. Lessons and Actions: What We Learned from 10K SSD-Related Storage System Failures , 2019, USENIX Annual Technical Conference.
[9] Donald Beaver,et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .
[10] Thomas W. Tucker,et al. The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Robert Latham,et al. 24/7 Characterization of petascale I/O workloads , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[12] Peter Desnoyers,et al. Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.
[13] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[14] Stephen A. Jarvis,et al. Parallel File System Analysis Through Application I/O Tracing , 2013, Comput. J..
[15] Scott Klasky,et al. Storage Systems and I/O: Organizing, Storing, and Accessing Data for Scientific Discovery (Report for the DOE ASCR Workshop on Storage Systems and I/O) , 2018 .
[16] Surendra Byna,et al. Accelerating Science with the NERSC Burst Buffer Early User Program , 2016 .
[17] Dong H. Ahn,et al. Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters , 2016, HPDC.
[18] Yang Liu,et al. Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] Saurabh Gupta,et al. Improving large-scale storage system performance via topology-aware and balanced data placement , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).
[20] Franck Cappello,et al. LOGAIDER: A Tool for Mining Potential Correlations of HPC Log Events , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[21] Scott Klasky,et al. Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems? , 2019, IEEE Transactions on Computers.
[22] Raghul Gunasekaran,et al. Understanding I/O workload characteristics of a Peta-scale storage system , 2015, The Journal of Supercomputing.
[23] Sai Narasimhamurthy,et al. Characterizing Deep-Learning I/O Workloads in TensorFlow , 2018, 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS).
[24] Robert B. Ross,et al. CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[25] Shane Snyder,et al. IOMiner: Large-Scale Analytics Framework for Gaining Knowledge from I/O Logs , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).
[26] Julian M. Kunkel,et al. The SIOX Architecture - Coupling Automatic Monitoring and Optimization of Parallel I/O , 2014, ISC.
[27] Lavanya Ramakrishnan,et al. AnalyzeThis: an analysis workflow-aware storage system , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] Leonid Oliker,et al. HPC global file system performance analysis using a scientific-application derived benchmark , 2009, Parallel Comput..
[29] Kevin Harms,et al. UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis , 2017, PDSW-DISCS@SC.
[30] Robert B. Ross,et al. On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[31] Shane Snyder,et al. Toward Understanding I/O Behavior in HPC Workflows , 2018, 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS).
[32] Samuel Williams,et al. Analyzing Performance of Selected NESAP Applications on the Cori HPC System , 2017, ISC Workshops.
[33] Weiguo Liu,et al. End-to-end I/O Monitoring on Leading Supercomputers , 2022, NSDI.
[34] Scott Klasky,et al. Predicting Output Performance of a Petascale Supercomputer , 2017, HPDC.
[35] Robert B. Ross,et al. Modular HPC I/O Characterization with Darshan , 2016, 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT).
[36] Devarshi Ghoshal,et al. Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers , 2017, WORKS@SC.
[37] Yong Chen,et al. PFault: A General Framework for Analyzing the Reliability of High-Performance Parallel File Systems , 2018, ICS.
[38] Peter Desnoyers,et al. Data Storage Research Vision 2025: Report on NSF Visioning Workshop held May 30--June 1, 2018 , 2018 .
[39] Weikuan Yu,et al. Challenges and Opportunities of User-Level File Systemsfor HPC , 2017 .
[40] Christian Engelmann,et al. Big Data Meets HPC Log Analytics: Scalable Approach to Understanding Systems at Extreme Scale , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[41] Devesh Tiwari,et al. GUIDE: A Scalable Information Directory Service to Collect, Federate, and Analyze Logs for Operational Insights into a Leadership HPC Facility , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[42] Leonid Oliker,et al. Investigation of leading HPC I/O performance using a scientific-application derived benchmark , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[43] Shane Snyder,et al. A Year in the Life of a Parallel File System , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[44] Saurabh Gupta,et al. Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[45] Andrew Uselton,et al. A File System Utilization Metric for I / O Characterization , 2013 .
[46] Robert Latham,et al. Understanding and improving computational science storage access through continuous characterization , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).
[47] Philip H. Carns,et al. Tools for Analyzing Parallel I/O , 2018, ISC Workshops.
[48] Devesh Tiwari,et al. A practical approach to reconciling availability, performance, and capacity in provisioning extreme-scale storage systems , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[49] Robert B. Ross,et al. Fail-Slow at Scale , 2018, ACM Trans. Storage.
[50] Samuel Lang,et al. Server-side I/O coordination for parallel file systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[51] Dror G. Feitelson,et al. The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..
[52] Kevin Harms,et al. TOKIO on ClusterStor: Connecting Standard Tools to Enable Holistic I/O Performance Analysis , 2018 .
[53] Yang Liu,et al. Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.
[54] Robert Ricci,et al. Taming Performance Variability , 2018, OSDI.
[55] Franck Cappello,et al. Scheduling the I/O of HPC Applications Under Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[56] Marianne Winslett,et al. A Multiplatform Study of I/O Behavior on Petascale Supercomputers , 2015, HPDC.