Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus

The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of Adaptive Simpson’s Integration to programs that can utilize multiple cores of modern processors. Using the example of Belman–Ford algorithm for solving single-source shortest path problems, we advise how to improve performance of data parallel algorithms by tuning data structures for better utilization of vector extensions of modern processors. Manual vectorization techniques based on Cilk array notation and intrinsics are presented. We also show how to simplify such optimization using Intel SIMD Data Layout Template containers.

[1]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[2]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[3]  Przemyslaw Stpiczynski,et al.  Efficient Language-Based Parallelization of Computational Problems Using Cilk Plus , 2017, PPAM.

[4]  Ami Marowka Parallel computing on any desktop , 2007, CACM.

[5]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[6]  Arch D. Robison,et al.  Composable Parallel Patterns with Intel Cilk Plus , 2013, Computing in Science & Engineering.

[7]  A. Leist,et al.  A Comparative Analysis of Parallel Programming Models for C , 2014 .

[8]  Ami Marowka TBBench: A Micro-Benchmark Suite for Intel Threading Building Blocks , 2012, J. Inf. Process. Syst..

[9]  Andrey Semin,et al.  Optimizing HPC Applications with Intel® Cluster Tools , 2014, Apress.

[10]  Rezaur Rahman,et al.  Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers , 2013 .

[11]  Rezaur Rahman Intel® Xeon Phi™ Coprocessor Architecture and Tools , 2013, Apress.

[12]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[13]  Christian Terboven,et al.  Using OpenMP - The Next Step: Affinity, Accelerators, Tasking, and SIMD , 2017, Using OpenMP - The Next Step.

[14]  Avinash Sodani,et al.  Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[15]  James N. Lyness,et al.  Notes on the Adaptive Simpson Quadrature Routine , 1969, J. ACM.

[16]  Przemyslaw Stpiczynski Semiautomatic Acceleration of Sparse Matrix-Vector Product Using OpenACC , 2015, PPAM.