Compiler parallelization of SIMPLE for a distributed memory machine

In machines like the Intel iPSC/2 and the BBN Butterfly, local memory operations are much faster than inter-processor communication. When writing programs for these machines, programmers must worry about exploiting spatial locality of reference. This is tedious and reduces the level of abstraction at the which the programmer works. We are implementing a parallelizing compiler that will shoulder much of that burden. Given a sequential, shared memory program and a specification of how data structures are to be mapped across the processors, our compiler will perform process decomposition to exploit locality of reference. In this paper, we discuss some experiments in parallelizing SIMPLE, a large scientific benchmark from Los Alamos, for the Intel iPSC/2.