论文信息 - Automatic, Application-Aware I/O Forwarding Resource Allocation

Automatic, Application-Aware I/O Forwarding Resource Allocation

The I/O forwarding architecture is widely adopted on modern supercomputers, with a layer of intermediate nodes sitting between the many compute nodes and backend storage nodes. This allows compute nodes to run more efficiently and stably with a leaner OS, offloads I/O coordination and communication with backend from the compute nodes, maintains less concurrent connections to storage systems, and provides additional resources for effective caching, prefetching, write buffering, and I/O aggregation. However, with many existing machines, these forwarding nodes are assigned to serve a fixed set of compute nodes. We explore an automatic mechanism, DFRA, for application-adaptive dynamic forwarding resource allocation. We use I/O monitoring data that proves affordable to acquire in real time and maintain for long-term history analysis. Upon each job’s dispatch, DFRA conducts a historybased study to determine whether the job should be granted more forwarding resources or given dedicated forwarding nodes. Such customized I/O forwarding lets the small fraction of I/O-intensive applications achieve higher I/O performance and scalability, meanwhile effectively isolating disruptive I/O activities. We implemented, evaluated, and deployed DFRA on Sunway TaihuLight, the current No.3 supercomputer in the world. It improves applications’ I/O performance by up to 18.9×, eliminates most of the interapplication I/O interference, and has saved over 200 million of core-hours during its test deployment on TaihuLight for 11 months. Finally, our proposed DFRA design is not platformdependent, making it applicable to the management of existing and future I/O forwarding or burst buffer resources.

[1] Song Jiang,et al. IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[2] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3] Marianne Winslett,et al. A Multiplatform Study of I/O Behavior on Petascale Supercomputers , 2015, HPDC.

[4] André Brinkmann,et al. A Configurable Rule based Classful Token Bucket Filter Network Request Scheduler for the Lustre File System , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5] Guangwen Yang,et al. swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6] Lofstead Jay,et al. DAOS and Friends: A Proposal for an Exascale Storage System , 2016 .

[7] Purushotham Bangalore,et al. Managing I/O Interference in a Shared Burst Buffer System , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[8] Peng Wang,et al. High-Frequency Nonlinear Earthquake Simulations on Petascale Heterogeneous Supercomputers , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9] Robert B. Ross,et al. Fail-Slow at Scale , 2018, ACM Trans. Storage.

[10] Galen M. Shipman,et al. LADS: Optimizing Data Transfers Using Layout-Aware Data Scheduling , 2015, FAST.

[11] Robert Latham,et al. Scalable I/O forwarding framework for high-performance computing systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[12] Willy Zwaenepoel,et al. Rock you like a hurricane: taming skew in large scale analytics , 2018, EuroSys.

[13] John Bent,et al. Hybrid flash arrays for HPC storage systems: An alternative to burst buffers , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[14] Liu Yang,et al. Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems , 2016 .

[15] T. Inglett,et al. Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[16] Franck Cappello,et al. Scheduling the I/O of HPC Applications Under Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[17] Arie Shoshani,et al. Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks , 2014, Concurr. Comput. Pract. Exp..

[18] Henry M. Tufo,et al. Exploration of Parallel Storage Architectures for a Blue Gene / L on the TeraGrid , 2008 .

[19] Robert B. Ross,et al. On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[20] Robert Latham,et al. Leveraging burst buffer coordination to prevent I/O interference , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[21] Robert B. Ross,et al. Optimization Techniques at the I/O Forwarding Layer , 2010, 2010 IEEE International Conference on Cluster Computing.

[22] K.,et al. The Community Earth System Model (CESM) large ensemble project: a community resource for studying climate change in the presence of internal climate variability , 2015 .

[23] Rolf Riesen,et al. CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2008 .

[24] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[25] Kamil Iskra,et al. ZOID: I/O-forwarding infrastructure for petascale architectures , 2008, PPoPP.

[26] Robert B. Ross,et al. CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[27] Devarshi Ghoshal,et al. Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers , 2017, WORKS@SC.

[28] Teng Wang,et al. TRIO: Burst Buffer Based I/O Orchestration , 2015, 2015 IEEE International Conference on Cluster Computing.

[29] Yulei Wang,et al. The accurate particle tracer code , 2016, Comput. Phys. Commun..

[30] Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[31] Karsten Schwan,et al. Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[32] Darrell D. E. Long,et al. ASCAR: Automating contention management for high-performance storage systems , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[33] Yang Liu,et al. Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[34] Saurabh Gupta,et al. Improving large-scale storage system performance via topology-aware and balanced data placement , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[35] John Bent,et al. PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[36] Karthik Vijayakumar,et al. Scalable I/O tracing and analysis , 2009, PDSW '09.

[37] Xiaoming Zhang,et al. Hybrid hierarchy storage system in MilkyWay-2 supercomputer , 2014, Frontiers of Computer Science.

[38] D. Giltrap,et al. DNDC: A process-based model of greenhouse gas fluxes from agricultural soils , 2010 .

[39] Jun Zhou,et al. Physics-based seismic hazard analysis on petascale heterogeneous supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[40] Yuichi Tsujita,et al. Alleviating I/O Interference Through Workload-Aware Striping and Load-Balancing on Parallel File Systems , 2017, ISC.

[41] Mahesh Balakrishnan,et al. Enabling Space Elasticity in Storage Systems , 2016, SYSTOR.

[42] Dhabaleswar K. Panda,et al. Scalable Earthquake Simulation on Petascale Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[43] Robert B. Ross,et al. On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[44] Robert B. Ross,et al. Accelerating I/O Forwarding in IBM Blue Gene/P Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.