Automatic, Application-Aware I/O Forwarding Resource Allocation

The I/O forwarding architecture is widely adopted on modern supercomputers, with a layer of intermediate nodes sitting between the many compute nodes and backend storage nodes. This allows compute nodes to run more efficiently and stably with a leaner OS, offloads I/O coordination and communication with backend from the compute nodes, maintains less concurrent connections to storage systems, and provides additional resources for effective caching, prefetching, write buffering, and I/O aggregation. However, with many existing machines, these forwarding nodes are assigned to serve a fixed set of compute nodes. We explore an automatic mechanism, DFRA, for application-adaptive dynamic forwarding resource allocation. We use I/O monitoring data that proves affordable to acquire in real time and maintain for long-term history analysis. Upon each job’s dispatch, DFRA conducts a historybased study to determine whether the job should be granted more forwarding resources or given dedicated forwarding nodes. Such customized I/O forwarding lets the small fraction of I/O-intensive applications achieve higher I/O performance and scalability, meanwhile effectively isolating disruptive I/O activities. We implemented, evaluated, and deployed DFRA on Sunway TaihuLight, the current No.3 supercomputer in the world. It improves applications’ I/O performance by up to 18.9×, eliminates most of the interapplication I/O interference, and has saved over 200 million of core-hours during its test deployment on TaihuLight for 11 months. Finally, our proposed DFRA design is not platformdependent, making it applicable to the management of existing and future I/O forwarding or burst buffer resources.

[1]  Song Jiang,et al.  IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  Marianne Winslett,et al.  A Multiplatform Study of I/O Behavior on Petascale Supercomputers , 2015, HPDC.

[4]  André Brinkmann,et al.  A Configurable Rule based Classful Token Bucket Filter Network Request Scheduler for the Lustre File System , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Guangwen Yang,et al.  swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  Lofstead Jay,et al.  DAOS and Friends: A Proposal for an Exascale Storage System , 2016 .

[7]  Purushotham Bangalore,et al.  Managing I/O Interference in a Shared Burst Buffer System , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[8]  Peng Wang,et al.  High-Frequency Nonlinear Earthquake Simulations on Petascale Heterogeneous Supercomputers , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Robert B. Ross,et al.  Fail-Slow at Scale , 2018, ACM Trans. Storage.

[10]  Galen M. Shipman,et al.  LADS: Optimizing Data Transfers Using Layout-Aware Data Scheduling , 2015, FAST.

[11]  Robert Latham,et al.  Scalable I/O forwarding framework for high-performance computing systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[12]  Willy Zwaenepoel,et al.  Rock you like a hurricane: taming skew in large scale analytics , 2018, EuroSys.

[13]  John Bent,et al.  Hybrid flash arrays for HPC storage systems: An alternative to burst buffers , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[14]  Liu Yang,et al.  Server-Side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems , 2016 .

[15]  T. Inglett,et al.  Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[16]  Franck Cappello,et al.  Scheduling the I/O of HPC Applications Under Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[17]  Arie Shoshani,et al.  Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks , 2014, Concurr. Comput. Pract. Exp..

[18]  Henry M. Tufo,et al.  Exploration of Parallel Storage Architectures for a Blue Gene / L on the TeraGrid , 2008 .

[19]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[20]  Robert Latham,et al.  Leveraging burst buffer coordination to prevent I/O interference , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[21]  Robert B. Ross,et al.  Optimization Techniques at the I/O Forwarding Layer , 2010, 2010 IEEE International Conference on Cluster Computing.

[22]  K.,et al.  The Community Earth System Model (CESM) large ensemble project: a community resource for studying climate change in the presence of internal climate variability , 2015 .

[23]  Rolf Riesen,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2008 .

[24]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[25]  Kamil Iskra,et al.  ZOID: I/O-forwarding infrastructure for petascale architectures , 2008, PPoPP.

[26]  Robert B. Ross,et al.  CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[27]  Devarshi Ghoshal,et al.  Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers , 2017, WORKS@SC.

[28]  Teng Wang,et al.  TRIO: Burst Buffer Based I/O Orchestration , 2015, 2015 IEEE International Conference on Cluster Computing.

[29]  Yulei Wang,et al.  The accurate particle tracer code , 2016, Comput. Phys. Commun..

[30]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[31]  Karsten Schwan,et al.  Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[32]  Darrell D. E. Long,et al.  ASCAR: Automating contention management for high-performance storage systems , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[33]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[34]  Saurabh Gupta,et al.  Improving large-scale storage system performance via topology-aware and balanced data placement , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[35]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[36]  Karthik Vijayakumar,et al.  Scalable I/O tracing and analysis , 2009, PDSW '09.

[37]  Xiaoming Zhang,et al.  Hybrid hierarchy storage system in MilkyWay-2 supercomputer , 2014, Frontiers of Computer Science.

[38]  D. Giltrap,et al.  DNDC: A process-based model of greenhouse gas fluxes from agricultural soils , 2010 .

[39]  Jun Zhou,et al.  Physics-based seismic hazard analysis on petascale heterogeneous supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[40]  Yuichi Tsujita,et al.  Alleviating I/O Interference Through Workload-Aware Striping and Load-Balancing on Parallel File Systems , 2017, ISC.

[41]  Mahesh Balakrishnan,et al.  Enabling Space Elasticity in Storage Systems , 2016, SYSTOR.

[42]  Dhabaleswar K. Panda,et al.  Scalable Earthquake Simulation on Petascale Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[43]  Robert B. Ross,et al.  On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[44]  Robert B. Ross,et al.  Accelerating I/O Forwarding in IBM Blue Gene/P Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.