论文信息 - Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems

Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems

There is growing concern that I/O systems will be hard pressed to satisfy the requirements of future leadership-class machines. Even current machines are found to be I/O bound for some applications. In this paper, we identify existing performance bottlenecks in data movement for I/O on the IBM Blue Gene/P (BG/P) supercomputer currently deployed at several leadership computing facilities. We improve the I/O performance by exploiting the network topology of BG/P for collective I/O, leveraging data semantics of applications and incorporating asynchronous data staging. We demonstrate the efficacy of our approaches for synthetic benchmark experiments and for application-level benchmarks at scale on leadership computing systems.

Michael E. Papka | Mark Hereld | Venkatram Vishwanath | Vitali A. Morozov

[1] Jianwei Li,et al. Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[2] Marianne Winslett,et al. Improving MPI-IO output performance with active buffering plus threads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[3] Ramanan Sankaran,et al. Three-dimensional direct numerical simulation of a turbulent lifted hydrogen jet flame in heated coflow: flame stabilization and structure , 2009, Journal of Fluid Mechanics.

[4] T. Inglett,et al. Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[5] B. Fryxell,et al. FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[6] Karsten Schwan,et al. DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[7] Robert B. Ross,et al. Accelerating I/O Forwarding in IBM Blue Gene/P Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[8] Karsten Schwan,et al. Adaptable, metadata rich IO methods for portable high performance IO , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9] Onkar Sahni,et al. Scalable parallel I/O alternatives for massively parallel partitioned solver systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[10] Wei-keng Liao,et al. Scaling parallel I/O performance through I/O delegate and caching system , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[11] Wei-keng Liao,et al. A case study for scientific I/O: improving the FLASH astrophysics code , 2012 .

[12] Scott Klasky,et al. Enabling high-speed asynchronous data extraction and transfer using DART , 2010 .

[13] Karsten Schwan,et al. DataStager: scalable data staging services for petascale applications , 2009, HPDC.

[14] Vivek Sarkar,et al. Software challenges in extreme scale systems , 2009 .

[15] Onkar Sahni,et al. Scalable implicit finite element solver for massively parallel processing with demonstration to 160K cores , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[17] Scott Klasky,et al. Enabling high-speed asynchronous data extraction and transfer using DART , 2010, Concurr. Comput. Pract. Exp..