Automatic Run-time Parallelization and Transformation of I/O

As the size of computational clusters grows, one can expect that I/O will consume an increasing portion of wall-clock time as the problem and node sizes are scaled up, unless parallel I/O is introduced. Unfortunately, using parallel I/O is non-trivial, so few applications developed by individual researchers enjoy its benefits. In this paper, we describe our novel method for analyzing I/O and communication operations at run-time. When nodes perform I/O or communication operations, our technique protects the memory associated with the requests from the application. Subsequent operations are analyzed for overlap between communication and I/O operations. When found, the I/O operation is automatically transformed, by our injected library, from an individual operation to a collective and shared MPI I/O operation. This allows users to benefit from parallel file systems without redesigning or recompiling their applications, and we demonstrate speedup for common usage patterns.