Efficient Language-Based Parallelization of Computational Problems Using Cilk Plus

The aim of this paper is to evaluate Cilk Plus as a language-based tool for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show that existing source codes can be easily transformed to programs that can utilize multiple cores and additionally offload some computations to coprocessors like Intel Xeon Phi. We also advise how to improve simplicity and performance of data parallel algorithms by tuning data structures to utilize vector extensions of modern processors. Numerical experiments show that in most cases our Cilk Plus versions of Adaptive Simpson’s Integration and Belman-Ford Algorithm for solving single-source shortest-path problems achieve better performance than corresponding OpenMP programs.

[1]  M. Musaev,et al.  Accelerate the Solution of Problems of Digital Signal Processing Technology Based INTEL CILK PLUS , 2015 .

[2]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[3]  Andrey Semin,et al.  Optimizing HPC Applications with Intel® Cluster Tools , 2014, Apress.

[4]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[5]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[6]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[7]  Christian Terboven,et al.  Using OpenMP - The Next Step: Affinity, Accelerators, Tasking, and SIMD , 2017, Using OpenMP - The Next Step.

[8]  Arch D. Robison,et al.  Composable Parallel Patterns with Intel Cilk Plus , 2013, Computing in Science & Engineering.

[9]  Brad A. Myers,et al.  A course-based usability analysis of Cilk Plus and OpenMP , 2015, 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[10]  Stephen Lewin-Berlin Exploiting multicore systems with Cilk , 2010, PASCO.

[11]  Rezaul Alam Chowdhury,et al.  A Parallel Bottom-Up Resolution Algorithm Using Cilk , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[12]  Pierre Jouvelot,et al.  Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages , 2012, LCPC.

[13]  Przemyslaw Stpiczynski Semiautomatic Acceleration of Sparse Matrix-Vector Product Using OpenACC , 2015, PPAM.

[14]  Janusz S. Kowalik,et al.  Using OpenCL - Programming Massively Parallel Computers , 2012, Advances in Parallel Computing.

[15]  Rezaur Rahman Intel® Xeon Phi™ Coprocessor Architecture and Tools , 2013, Apress.

[16]  James N. Lyness,et al.  Notes on the Adaptive Simpson Quadrature Routine , 1969, J. ACM.

[17]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[18]  Rezaur Rahman,et al.  Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers , 2013 .

[19]  Andrey Vladimirov,et al.  Intel Cilk Plus for complex parallel algorithms: "Enormous Fast Fourier Transforms" (EFFT) library , 2015, Parallel Comput..