A Comparative Analysis of Parallel Programming Models for C

The parallel programming model used in a software development project can significantly affect the way concurrency is expressed in the code. It also comes with certain trade-offs with regard to ease of development, code readability, functionality, runtime overheads, and scalability. Especially the performance aspects can vary with the type of parallelism suited to the problem at hand. We investigate how well three popular multi-tasking frameworks for C++ — Threading Building Blocks, Cilk Plus, and OpenMP 4.0 — cope with three of the most common parallel scenarios: recursive divide-and-conquer algorithms; embarrassingly parallel loops; and loops that update shared variables. We implement merge sort, matrix multiplication, and dot product as test cases for the respective scenario in each of the programming models. We then go one step further and also apply the vectorisation support offered by Cilk Plus and OpenMP 4.0 to the data-parallel aspects of the loop-based algorithms. Our results demonstrate that certain configurations expose significant differences in the task creation and scheduling overheads among the tested frameworks. We also highlight the importance of testing how well an algorithm scales with the number of hardware threads available to the application. Keywords—parallel programming models; performance; TBB; Cilk Plus; OpenMP 4.0

[1]  Matteo Frigo,et al.  Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[2]  Karen Bradshaw,et al.  Investigating the Performance and Code Characteristics of Three Parallel Programming Models for C + + , 2010 .

[3]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[4]  Paul,et al.  A High-Performance Message-Passing Library for the AP3000 , 1998 .

[5]  Michelle Mills Strout,et al.  Executing Optimized Irregular Applications Using Task Graphs within Existing Parallel Models , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[6]  Kenneth A. Hawick,et al.  Exploiting graphical processing units for data‐parallel scientific applications , 2009, Concurr. Comput. Pract. Exp..

[7]  Mats Brorsson,et al.  A Comparison of some recent Task-based Parallel Programming Models , 2010 .

[8]  Ümit V. Çatalyürek,et al.  An Early Evaluation of the Scalability of Graph Algorithms on the Intel MIC Architecture , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[9]  Konstantinos G. Margaritis,et al.  Computational Comparison of Some Multi-core Programming Tools for Basic Matrix Computations , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[10]  Novica Nosovic,et al.  A comparison of five parallel programming models for C++ , 2012, 2012 Proceedings of the 35th International Convention MIPRO.

[11]  Andrzej Nowak,et al.  Comparison of Software Technologies for Vectorization and Parallelization , 2012 .