The Performance of Parallel Disk Write Methods for Linux Multiprocessor Nodes

We experimentally evaluate several methods for implementing parallel computations that interleave a significant number of contiguous or strided writes to a local disk on Linux-based multiprocessor nodes. Using synthetic benchmark programs written with MPI and Pthreads, we have acquired detailed performance data for different application characteristics of programs running on dual processor nodes. In general, our results show that programs that perform a significant amount of I/O relative to pure computation benefit greatly from the use of threads, while programs that perform relatively little I/O obtain excellent results using only MPI. For a pure MPI approach, we have found that it is usually best to use two writing processes with mmap(). For Pthreads it is usually best to use two writing processes, write() for contiguous data, and writev() for strided data. Codes that use mmap() tend to benefit from periodic syncs of the data of the data to the disk, while codes that use write() or writev() tend to have better performance with few syncs. A straightforward use of ROMIO usually does not perform as well as these direct approaches for writing to the local disk.