Parallelization of Irregular Out-of-Core Applications for Distributed-Memory Systems

Large scale irregular applications involve data arrays and other data structures that are too large to fit in main memory and hence reside on disks; such applications are called out-of-core applications. This paper presents techniques for implementing this kind of applications. In particular we present a design for a runtime system to efficiently support parallel execution of irregular out-of-core codes on distributed-memory systems. Furthermore, we describe the appropriate program transformations required to reduce the I/O overheads for staging data as well as for communication while maintaining load balance. The proposed techniques can be used by a parallelizing compiler or by users writing programs in node + message passing style. We have done a preliminary implementation of the techniques presented here. We introduce experimental results from a template CFD code to demonstrate the efficacy of the presented techniques.