Interference-Aware I/O Scheduling for Data-Intensive Applications on Hierarchical HPC Storage Systems

Scientific applications in critical areas are becoming more and more data-intensive. As data volume continues to grow, the data movement from memory to storage system has turned into a crucial performance bottleneck for many data-intensive applications. Newly emerged burst buffer concept provides a promising solution by increasing the depth of storage hierarchy to increase I/O performance of data-intensive applications. However, the data management on such multi-layer hierarchical storage system is still understudied. How to leverage each layer of storage for efficient data movement is an important research topic for HPC field. In this paper, we present a dynamic, interference-aware scheduling scheme that can efficiently manages the I/O scheduling among different layers of hierarchical HPC storage system to coordinate multiple concurrent data-intensive applications. Extensive experiments have been conducted and the results have demonstrated that our proposed approach can significantly improve the I/O performance of data-intensive applications.

[1]  Dong H. Ahn,et al.  Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters , 2016, HPDC.

[2]  Purushotham Bangalore,et al.  Managing I/O Interference in a Shared Burst Buffer System , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[3]  Sorin Faibish,et al.  Jitter-free co-processing on a prototype exascale storage stack , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[4]  Soonwook Hwang,et al.  Accelerating a Burst Buffer Via User-Level I/O Isolation , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[5]  Jia Wang,et al.  I/O-Aware Batch Scheduling for Petascale Computing Systems , 2015, 2015 IEEE International Conference on Cluster Computing.

[6]  Bronis R. de Supinski,et al.  The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[8]  R. Nan,et al.  Five-hundred-meter Aperture Spherical Telescope project , 2001 .

[9]  Wang Teng,et al.  An Ephemeral Burst-Buffer File System for Scientific Applications , 2016 .

[10]  Lofstead Jay,et al.  DAOS and Friends: A Proposal for an Exascale Storage System , 2016 .

[11]  Leonid Oliker,et al.  Parallel I/O performance: From events to ensembles , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[12]  Robert Latham,et al.  Scalable I/O and analytics , 2009 .

[13]  Yong Chen,et al.  Contention-Aware Resource Scheduling for Burst Buffer Systems , 2018, ICPP Workshops.

[14]  Xian-He Sun,et al.  Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system , 2018, HPDC.

[15]  Franck Cappello,et al.  Scheduling the I/O of HPC Applications Under Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[16]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[17]  Song Jiang,et al.  IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Karsten Schwan,et al.  Six degrees of scientific data: reading patterns for extreme scale science IO , 2011, HPDC '11.

[19]  Jordan G. Powers,et al.  A Description of the Advanced Research WRF Version 2 , 2005 .

[20]  Karsten Schwan,et al.  Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Satoshi Matsuoka,et al.  A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[22]  Michael Lang,et al.  Active Burst-Buffer: In-Transit Processing Integrated into Hierarchical Storage , 2016, 2016 IEEE International Conference on Networking, Architecture and Storage (NAS).

[23]  Adrien Lèbre,et al.  I/O Scheduling Service for Multi-Application Clusters , 2006, 2006 IEEE International Conference on Cluster Computing.