Runtime support for unstructured data accesses on coarse-grained, distributed-memory parallel machines

The parallelization of several applications result in unstructured data accesses on coarse-grained, distributed-memory parallel machines. In many cases these irregular access patterns are only available at runtime and are nonrepetitive. They may also result in load imbalance in communication and local computation. Thus, to achieve a good performance, it is necessary to provide efficient runtime support for unstructured data accesses. In this dissertation we present techniques of efficient software support for the minimization of communication overhead for such applications. These new techniques are relatively scalable and architecture-independent. We have developed several optimizations for reducing the overall communication cost. In particular, we have developed techniques to support the following classes of applications: (1) Breadth-first search-based applications: Wolff's cluster algorithm and Lee's maze-routing algorithm; (2) HPF array construction functions: PACK/UNPACK, along with HPF array reduction and prefix/suffix functions; (3) HPF generalized array reduction function: Array combining scatter function. In addition, we have developed parallel algorithms for vector reduction and prefix primitives that can be used effectively in many applications with structured as well as unstructured data accesses. Runtime support for some of these applications results in several competing techniques, depending on the size of data sets as well as access patterns. For such scenarios we have developed decision systems that will choose the best techniques by performing simple measurements at runtime. These results are supported by extensive experimental results on the CM-5.