Fast Parallel Non-Contiguous File Access

Many applications of parallel I/O perform non-contiguous file accesses: instead of accessing a single (large) block of data in a file, a number of (smaller) blocks of data scattered throughout the file needs to be accessed in each logical I/O operation. However, only few file system interfaces directly support this kind of non-contiguous file access. In contrast, the most commonly used parallel programming interface, MPI, incorporates a exible model of parallel I/O through its MPI-IO interface. With MPI-IO, arbitrary non-contiguous file accesses are supported in a uniform fashion by the use of derived MPI datatypes set up by the user to re ect the desired I/O pattern. Despite a considerable amount of recent work in this area, current MPI-IO implementations suffer from low performance of such non-contiguous accesses when compared to the performance of the storage system for contiguous accesses. In this paper we analyze an important bottleneck in the efficient handling of non-contiguous access patterns in current implementations of MPI-IO. We present a new technique, termed listless I/O, that can be incorporated into MPI-IO implementations like the well-known ROMIO implementation, and completely eliminates this bottleneck. We have implemented the technique in MPI/SX, the MPI implementation for the NEC SX-series of parallel vector computers. Results with a synthetic benchmark and an application kernel show that listless I/O is able to increase the bandwidth for non-contiguous file access by sometimes more than a factor of 500 when compared to the traditional approach.

[1]  Robert B. Ross,et al.  Noncontiguous I/O through PVFS , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[2]  Hubert Ritzdorf,et al.  Improving Generic Non-contiguous File Access for MPI-IO , 2003, PVM/MPI.

[3]  J. Shalf,et al.  Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[4]  Marianne Winslett,et al.  Improving MPI-IO output performance with active buffering plus threads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[5]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[6]  Bin Jia,et al.  MPI-IO/GPFS, an Optimized Implementation of MPI-IO on Top of GPFS , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[7]  Rajeev Thakur,et al.  Optimizing noncontiguous accesses in MPI-IO , 2002, Parallel Comput..

[8]  SkjellumAnthony,et al.  A high-performance, portable implementation of the MPI message passing interface standard , 1996 .

[9]  Hubert Ritzdorf,et al.  Flattening on the Fly: Efficient Handling of MPI Derived Datatypes , 1999, PVM/MPI.

[10]  David Jones High performance , 1989, Nature.

[11]  Marc Snir,et al.  The MPI core , 1998 .

[12]  Hubert Ritzdorf,et al.  The MPI/SX implementation of MPI for NEC's SX-6 and other NEC platforms , 2003 .

[13]  Rob VanderWijngaart,et al.  NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[14]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[15]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[16]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[17]  Hubert Ritzdorf,et al.  The Implementation of MPI-2 One-Sided Communication for the NEC SX-5 , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[18]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[19]  Rajeev Thakur,et al.  Evaluation of Collective I/O Implementations on Parallel Architectures , 2001, J. Parallel Distributed Comput..

[20]  William Gropp,et al.  Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .