Automatic optimization of communication in compiling out-of-core stencil codes

Codes Rajesh Bordawekar Alok Choudhary ECE Dept., 121, Link Hall, Syracuse University, Syracuse, NY 13244 frajesh, choudharg@cat.syr.edu J. Ramanujam ECE. Dept., Louisiana State University, Baton Rouge, LA 70803 jxr@gate.ee.lsu.edu Abstract In this paper, we describe a technique for optimizing communication for out-of-core distributed memory stencil problems. In these problems, communication may require both inter-processor communication and le I/O. We show that in certain cases, extra le I/O incurred in communication can be completely eliminated by reordering in-core computations. The in-core computation pattern is decided by: (1) how the out-of-core data distributed into in-core slabs (tiling) and (2) how the slabs are accessed. We show that a compiler using the stencil and processor information can choose the tiling parameters and schedule the tile accesses so that the extra le I/O is eliminated and overall performance is improved.