An approach of performance comparisons with OpenMP and CUDA parallel programming on multicore systems

In the past, the tenacious semiconductor problems of operating temperature and power consumption limited the performance growth for single‐core microprocessors. Microprocessor vendors hence adopt the multicore chip organizations with parallel processing because the new technology promises faster and lower power needed. In a short time, this trend floods first the development of CPU, then also the other peripherals like GPU. Modern GPUs are very efficient in manipulating computer graphics, and their highly parallel structure makes them even more effective than general‐purpose CPUs for a range of graphical complex algorithms. However, technology of multicore processor brought revolution and unavoidable collision to the programming personnel. Multicore processor has high performance; however, parallel processing brings not only the opportunity but also a challenge. The issue of efficiency and the way how programmer or compiler parallelizes the software explicitly are the keys that enhance the performance on multicore chip. In this paper, we propose a parallel programming approach using hybrid CUDA, OpenMP, and MPI programming. There would be two verificational experiments presented in the paper. In the first, we would verify the availability and correctness of the auto‐parallel tools, and discuss the performance issues on CPU, GPU, and embedded system. In the second, we would verify how the hybrid programming could surely improve performance. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Betim Cico,et al.  Scalability of Gravity Inversion with OpenMP and MPI in Parallel Processing , 2012, ICT Innovations.

[2]  R. Dolbeau,et al.  HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .

[3]  Chao-Tung Yang,et al.  Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters , 2011, Concurr. Comput. Pract. Exp..

[4]  Chao-Tung Yang,et al.  Performance Comparison with OpenMP Parallelization for Multi-core Systems , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[5]  David Castells-Rufas,et al.  OMP2MPI: Automatic MPI code generation from OpenMP programs , 2015, ArXiv.

[6]  Chao-Tung Yang,et al.  Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..

[7]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[8]  Christopher Umans Group-theoretic algorithms for matrix multiplication , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[9]  Chao-Tung Yang,et al.  Hybrid Parallel Programming on GPU Clusters , 2010, International Symposium on Parallel and Distributed Processing with Applications.

[10]  Nick Knupffer Intel Corporation , 2018, The Grants Register 2019.

[11]  Sol Ji Kang,et al.  Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems , 2015, Adv. Multim..