An unified parallel C compiler that implements automatic communication aggregation

Partitioned Global Address Space (PGAS) programming languages, such as Unified Parallel C (UPC), offer an attractive high-productivity programming model for programming large-scale parallel machines. PGAS languages partition the application’s address space into private, shared-local and shared-remote memory. When running in a distributed-memory environment, accessing shared-remote memory leads to implicit communication. For fine-grained accesses, which are frequently found in UPC programs, this communication overhead can significantly impact program performance. One solution for reducing the number of fine-grained accesses is to coalesce several accesses into a single access. This paper presents an analysis to identify opportunities for coalescing and an algorithm that allows the compiler to automatically coalesce accesses to shared-remote memory in UPC. It also describes how opportunities for coalescing can be created by the compiler through loop unrolling. Results obtained from coalescing accesses in manually-unrolled parallel loops are presented to demonstrate the benefit of combining parallel loop unrolling and communication coalescing.