Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs

State-of-the-art multiprocessor systems pose several difficulties: (i) the user has to parallelize the existing serial code; (ii) explicitly threaded programs using a thread library are not portable; (iii) writing efficient multi-threaded programs requires intimate knowledge of machine's architecture and micro-architecture. Thus, well-tuned parallelizing compilers are in high demand to leverage state-of-the-art computer advances of NUMA-based multiprocessors, simultaneous multi-threading processors and chip-multiprocessor systems in response to the performance quest from the high-performance computing community. On the other hand, OpenMP* has emerged as the industry standard parallel programming model. Applications can be parallelized using OpenMP with less effort in a way that is portable across a wide range of multiprocessor systems. In this paper, we present several practical compiler optimization techniques and discuss their effect on the performance of OpenMP programs. We elaborate on the major design considerations in a high performance OpenMP compiler and present experimental data based on the implementation of the optimizations in the Intel® C++ and Fortran compilers. Interactions of the OpenMP transformation with other sequential optimizations in the compiler are discussed. The techniques in this paper have achieved significant performance improvements on the industry standard SPEC* OMPM2001 and SPEC* OMPL2001 benchmarks, and these performance results are presented for Intel® Pentium® and Itanium® processor based systems.

[1]  Mitsuhisa Sato,et al.  OmniRPC: A Grid RPC Facility for Cluster and Global Computing in OpenMP , 2001, WOMPAT.

[2]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[3]  Aart Johannes Casimir Bik The software vectorization handbook , 2004 .

[4]  J. M. Bull,et al.  Measuring Synchronisation and Scheduling Overheads in OpenMP , 2007 .

[5]  Bronis R. de Supinski,et al.  A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives , 2003, WOMPAT.

[6]  Sanjay Goil,et al.  Compiler Support and Performance Tuning of OpenMP Programs on SunFire Servers , 2022 .

[7]  Sven Karlsson A Portable and Efficient Thread Library for OpenMP , 2004 .

[8]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[9]  Milind Girkar,et al.  Towards efficient multi-level threading of H.264 encoder on Intel hyper-threading architectures , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[10]  Yen-Kuang Chen,et al.  The energy efficiency of CMP vs. SMT for multimedia workloads , 2004, ICS '04.

[11]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[12]  Xin-Min Tian,et al.  Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance , 2002 .

[13]  Milind Girkar,et al.  Exploring the use of Hyper-Threading technology for multimedia applications with Intel/spl reg/ OpenMP compiler , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[14]  Mats Brorsson,et al.  OdinMP/CCp - a portable implementation of OpenMP for C , 2000, Concurr. Pract. Exp..

[15]  Aart J. C. Bik,et al.  Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[16]  Vivek Sarkar,et al.  Automatic parallelization for symmetric shared-memory multiprocessors , 1996, CASCON.

[17]  Eiji Yamanaka,et al.  The SPEC OMP2001 Benchmark on the Fujitsu PRIMEPOWER System , 2001 .

[18]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[19]  David C. Sehr,et al.  On the importance of points-to analysis and other memory disambiguation methods for C programs , 2001, PLDI '01.

[20]  Géraud Krawezik Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors , 2003, SPAA '03.

[21]  Feng Liu,et al.  A Practical OpenMP Compiler for System on Chips , 2003, WOMPAT.

[22]  Rudolf Eigenmann,et al.  SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.