Massively Parallel I/O for Partitioned Solver Systems

This paper investigates I/O approaches for massively parallel partitioned solver systems. Typically, such systems have synchronized "loops" and write data in a well defined block I/O format consisting of a header and data portion. Our target use for such a parallel I/O subsystem is checkpoint-restart where writing is by far the most common operation and reading typically only happens during either initialization or during a restart operation because of a system failure. We compare four parallel I/O strategies: POSIX File Per Processor (1PFPP), "Poor-Man's" Parallel I/O (PMPIO), a synchronized parallel I/O (syncIO), and a "reduced blocking" strategy (rbIO). Performance tests executed on the Blue Gene/P at Argonne National Laboratory using real CFD solver data from PHASTA (an unstructured grid finite element Navier-Stokes solver) show that the syncIO strategy can achieve a read bandwidth of 47.4 GB/sec and a write bandwidth of 27.5 GB/sec using 128K processors. The "reduced-blocking" rbIO strategy achieves an actual writing performance of 17.8 GB/sec and the perceived writing performance is 166 TB/sec on Blue Gene/P using 128K processors.

[1]  John Shalf,et al.  Using IOR to analyze the I/O Performance for HPC Platforms , 2007 .

[2]  Onkar Sahni,et al.  Strong scaling analysis of a parallel, unstructured, implicit solver and the influence of the operating system interference , 2009 .

[3]  Subhash Saini,et al.  Parallel I/O Performance Characterization of Columbia and NEC SX-8 Superclusters , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[4]  John Shalf,et al.  Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Onkar Sahni,et al.  Strong scaling analysis of a parallel, unstructured, implicit solver and the influence of the operating system interference , 2009, Sci. Program..

[6]  Robert Latham,et al.  High performance file I/O for the Blue Gene/L supercomputer , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[7]  Rajeev Thakur,et al.  Users Guide for ROMIO: A High-Performance , 1997 .

[8]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[9]  Leonid Oliker,et al.  Investigation of leading HPC I/O performance using a scientific-application derived benchmark , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[10]  Wei-keng Liao,et al.  Scaling parallel I/O performance through I/O delegate and caching system , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  И.В. Булдашев,et al.  Определение коэффициента самодиффузии воды в пакете Gromacs , 2011 .

[12]  Onkar Sahni,et al.  Scalable parallel I/O alternatives for massively parallel partitioned solver systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[13]  Mark R. Fahey,et al.  I/O performance on a massively parallel Cray XT3/XT4 , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  Rajeev Thakur,et al.  Users guide for ROMIO: A high-performance, portable MPI-IO implementation , 1997 .

[15]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[16]  Jeffrey S. Vetter,et al.  Performance characterization and optimization of parallel I/O on the Cray XT , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[17]  Phillip M. Dickens,et al.  Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment , 2009, HPDC '09.

[18]  Seetharami Seelam,et al.  Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[19]  Leonid Oliker,et al.  HPC global file system performance analysis using a scientific-application derived benchmark , 2009, Parallel Comput..

[20]  Robert Latham,et al.  I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[21]  Robert Latham,et al.  24/7 Characterization of petascale I/O workloads , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[22]  Onkar Sahni,et al.  Scalable implicit finite element solver for massively parallel processing with demonstration to 160K cores , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[23]  Diego Donzis,et al.  Dissipation and enstrophy in isotropic turbulence: Resolution effects and scaling in direct numerical simulations , 2008 .

[24]  Robert Latham,et al.  Scalable I/O and analytics , 2009 .