Improved parallel I/O via a two-phase run-time access strategy

As scientists expand their models to describe physical phenomena of increasingly large extent, I/O becomes crucial and a system with limited I/O capacity can severely constrain the performance of the entire program.We provide experimental results, performed on an lntel Touchtone Delta and nCUBE 2 I/O system, to show that the performance of existing parallel I/O systems can vary by several orders of magnitude as a function of the data access pattern of the parallel program. We then propose a two-phase access strategy, to be implemented in a runtime system, in which the data distribution on computational nodes is decoupled from storage distribution. Our experimental results show that performance improvements of several orders of magnitude over direct access based data distribution methods can be obtained, and that performance for most data access patterns can be improved to within a factor of 2 of the best performance. Further, the cost of redistribution is a very small fraction of the overall access cost.