论文信息 - Scalable parallel I/O alternatives for massively parallel partitioned solver systems

Scalable parallel I/O alternatives for massively parallel partitioned solver systems

With the development of high-performance computing, I/O issues have become the bottleneck for many massively parallel applications. This paper investigates scalable parallel I/O alternatives for massively parallel partitioned solver systems. Typically such systems have synchronized “loops” and will write data in a well defined block I/O format consisting of a header and data portion. Our target use for such an parallel I/O subsystem is checkpoint-restart where writing is by far the most common operation and reading typically only happens during either initialization or during a restart operation because of a system failure. We compare four parallel I/O strategies: 1 POSIX File Per Processor (1PFPP), a synchronized parallel IO library (syncIO), “Poor-Man's” Parallel I/O (PMPIO) and a new “reduced blocking” strategy (rbIO). Performance tests using real CFD solver data from PHASTA (an unstructured grid finite element Navier-Stokes solver [1]) show that the syncIO strategy can achieve a read bandwidth of 6.6GB/Sec on Blue Gene/L using 16K processors which is significantly faster than 1PFPP or PMPIO approaches. The serial “token-passing” approach of PMPIO yields a 900MB/sec write bandwidth on 16K processors using 1024 files and 1PFPP achieves 600 MB/sec on 8K processors while the “reduced-blocked” rbIO strategy achieves an actual writing performance of 2.3GB/sec and perceived/latency hiding writing performance of more than 21,000 GB/sec (i.e., 21TB/sec) on a 32,768 processor Blue Gene/L.

[1] John Shalf,et al. Using IOR to analyze the I/O Performance for HPC Platforms , 2007 .

[2] Onkar Sahni,et al. Strong scaling analysis of a parallel, unstructured, implicit solver and the influence of the operating system interference , 2009, Sci. Program..

[3] Wei-keng Liao,et al. Scaling parallel I/O performance through I/O delegate and caching system , 2008, HiPC 2008.

[4] John Shalf,et al. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5] Leonid Oliker,et al. Investigation of leading HPC I/O performance using a scientific-application derived benchmark , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[6] Alexander I. Suciu,et al. LOWER CENTRAL SERIES AND FREE RESOLUTIONS OF HYPERPLANE ARRANGEMENTS , 2001, math/0109070.

[7] Onkar Sahni,et al. Scalable implicit finite element solver for massively parallel processing with demonstration to 160K cores , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8] Diego Donzis,et al. Dissipation and enstrophy in isotropic turbulence: Resolution effects and scaling in direct numerical simulations , 2008 .

[9] Wei-keng Liao,et al. Scaling parallel I/O performance through I/O delegate and caching system , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[10] И.В. Булдашев,et al. Определение коэффициента самодиффузии воды в пакете Gromacs , 2011 .

[11] Robert Latham,et al. I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[12] Robert Latham,et al. Scalable I/O and analytics , 2009 .

[13] Subhash Saini,et al. Parallel I/O Performance Characterization of Columbia and NEC SX-8 Superclusters , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[14] Mark R. Fahey,et al. I/O performance on a massively parallel Cray XT3/XT4 , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[15] Robert Latham,et al. High performance file I/O for the Blue Gene/L supercomputer , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[16] Rajeev Thakur,et al. Users guide for ROMIO: A high-performance, portable MPI-IO implementation , 1997 .

[17] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[18] Jeffrey S. Vetter,et al. Performance characterization and optimization of parallel I/O on the Cray XT , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[19] Phillip M. Dickens,et al. Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment , 2009, HPDC '09.

[20] Leonid Oliker,et al. HPC global file system performance analysis using a scientific-application derived benchmark , 2009, Parallel Comput..