Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

Today most systems in high-performance computing (HPC) feature a hierarchical hardware design: Shared memory nodes with several multi-core CPUs are connected via a network infrastructure. Parallel programming must combine distributed memory parallelization on the node interconnect with shared memory parallelization inside each node. We describe potentials and challenges of the dominant programming models on hierarchically structured hardware: Pure MPI (Message Passing Interface), pure OpenMP (with distributed shared memory extensions) and hybrid MPI+OpenMP in several ¿avors. We pinpoint cases where a hybrid programming model can indeed be the superior solution because of reduced communication needs and memory consumption, or improved load balance. Furthermore we show that machine topology has a signi¿cant impact on performance for all parallelization strategies and that topology awareness should be built into all applications in the future. Finally we give an outlook on possible standardization goals and extensions that could make hybrid programming easier to do with performance in mind.

[1]  Ulrich Rüde,et al.  Challenges and Potentials of Emerging Multicore Architectures , 2009 .

[2]  Gerhard Wellein,et al.  Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures , 2003, Int. J. High Perform. Comput. Appl..

[3]  W. Zwaenepoel,et al.  Shared Memory Computing on Networks of Workstations , 2004 .

[4]  Alan L. Cox,et al.  ThreadMarks: Shared Memory Computing on Networks of Workstations , 1996, Computer.

[5]  R.D. Loft,et al.  Terascale Spectral Element Dynamical Core for Atmospheric General Circulation Models , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[6]  Bronis R. de Supinski,et al.  Toward Enhancing OpenMP's Work-Sharing Directives , 2006, Euro-Par.

[7]  Gerhard Wellein,et al.  RZBENCH: Performance evaluation of current HPC architectures using low-level and application benchmarks , 2007, ArXiv.

[8]  R. Vanderwijngaart,et al.  NAS Parallel Benchmarks, Multi-Zone Versions , 2003 .

[9]  Bronis R. de Supinski,et al.  Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Haoqiang Jin,et al.  Performance characteristics of the multi-zone NAS parallel benchmarks , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[11]  Georg Hager,et al.  Hybrid MPI and OpenMP Parallel Programming , 2006, PVM/MPI.

[12]  Brice Goglin,et al.  High Throughput Intra-Node MPI Communication with Open-MX , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[13]  JinHaoqiang,et al.  Performance characteristics of the multi-zone NAS parallel benchmarks , 2006 .