Frameworks for Multi-core Architectures: A Comprehensive Evaluation Using 2D/3D Image Registration

The development of standard processors changed in the last years moving from bigger, more complex, and faster cores to putting several more simple cores onto one chip. This changed also the way programs are written in order to leverage the processing power of multiple cores of the same processor. In the beginning, programmers had to divide and distribute the work by hand to the available cores and to manage threads in order to use more than one core. Today, several frameworks exist to relieve the programmer from such tasks. In this paper, we present five such frameworks for parallelization on shared memory multi-core architectures, namely OpenMP, Cilk++, Threading Building Blocks, RapidMind, and OpenCL. To evaluate these frameworks, a real world application from medical imaging is investigated, the 2D/3D image registration. In an empirical study, a fine-grained data parallel and a coarse-grained task parallel parallelization approach are used to evaluate and estimate different aspects like usability, performance, and overhead of each framework.

[1]  Alexander V. Veidenbaum,et al.  Cache-aware iteration space partitioning , 2008, PPoPP.

[2]  Michael D. McCool,et al.  Metaprogramming GPUs with Sh , 2004 .

[3]  Emanuele Trucco,et al.  Introductory techniques for 3-D computer vision , 1998 .

[4]  A. Kubias,et al.  2 D / 3 D Image Registration on the GPU 1 , .

[5]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[6]  Jürgen Weese,et al.  Voxel-based 2-D/3-D registration of fluoroscopy images and CT scans for image-guided surgery , 1997, IEEE Transactions on Information Technology in Biomedicine.

[7]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[8]  Sergei Gorlatch,et al.  Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores , 2009, Euro-Par.

[9]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[10]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[11]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[12]  J. Teich,et al.  Comparison of Parallelization Frameworks for Shared Memory Multi-Core Architectures , 2010 .

[13]  Henk Sips,et al.  Euro-Par 2009 Parallel Processing, 15th International Euro-Par Conference, Delft, The Netherlands, August 25-28, 2009. Proceedings , 2009, Euro-Par.

[14]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[15]  Stephen L. Olivier,et al.  Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs , 2010, International Journal of Parallel Programming.

[16]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[17]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .