Compilation techniques for explicitly parallel programs

A problem faced by compilers of explicitly parallel languages, such as Java, OpenMP, and Pthreads, the solution of which is the focus of this thesis, is that data races and synchronization make it impossible to apply classical optimization and analysis techniques directly to shared memory parallel programs because the classical methods do not account for updates to variables in threads other than the one being analyzed. Many current multiprocessor architectures follow relaxed memory consistency models. However, this makes programming and porting more difficult. Moreover, sequential consistency, which is not a relaxed consistency model, is what most programmers assume when they program shared memory multiprocessors, even if they do not know exactly what sequential consistency is. In this thesis, we present analysis and optimization techniques for an optimizing compiler for shared memory explicitly parallel programs. The compiler presents programmers with a sequentially consistent view of the underlying machine irrespective of whether it follows a sequentially consistent model or a relaxed model. Furthermore, the compiler allows optimization techniques to be applied correctly to parallel programs that conventional compilers cannot handle. To hide the underlying relaxed memory consistency model and to guarantee sequential consistency, our algorithm inserts fence instructions. We reduce the number of fence instructions by exploiting the ordering constraints of the underlying memory consistency model and the property of the fence instruction. To do so, we introduce a new concept called dominance with respect to a node in a control flow graph. We also show that reducing the number of fences by minimizing the number of nodes is NP-hard. We introduce two intermediate representations: the concurrent control flow graph, and the concurrent static single assignment form. Based on these representations, we develop an analysis technique, called concurrent global value numbering, by extending classical value partitioning and global value numbering. We also extend commonly used classical compiler optimization techniques to parallel program using those intermediate representations. By doing this, we guarantee the correctness (sequential consistency) of the optimized program and maintain single processor performance in a multiprocessor environment. We also describe a parallel loop overhead reduction technique.

[1]  David A. Padua,et al.  Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs , 1997, LCPC.

[2]  Jaejin Lee,et al.  Hiding relaxed memory consistency with compilers , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[3]  David A. Padua,et al.  Basic compiler algorithms for parallel programs , 1999, PPoPP '99.