Wavefront template implementation based on the task programming model

A particular characteristic of the parallel wavefront pattern is the multi-dimensional streaming nature of the computations that must obey a dependence pattern. Modern task based programming libraries like TBB (Threading Building Blocks) provide interesting features to improve the scalability of this kind of codes but at a cost of leaving some low level task management details to the programmer. We discuss such low level task management issues and incorporate them into a high level TBB based template that we present in this paper. The goal of the template is to improve the programmer’s productivity to allow a non expert user to easily code complex wavefront problems without worrying about task creation, synchronization or scheduling mechanisms. In our template, the user only has to specify a definition file with the wavefront dependence pattern and the function that each task has to execute. In addition, we describe our experience with the TBB based template when coding four complex real wavefront problems, finding that the programming effort of the user is reduced from 25% to 50% at a cost of increasing the overhead below 5% when compared with manual TBB implementations of the same problem.

[1]  Wu-chun Feng,et al.  Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine , 2008, CF '08.

[2]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[3]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[4]  Gilles Brassard,et al.  Fundamentals of Algorithmics , 1995 .

[5]  Mateo Valero,et al.  Scalability of Macroblock-level Parallelism for H.264 Decoding , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[6]  Rafael Asenjo,et al.  Evaluation of the Task Programming Model in the Parallelization of Wavefront Problems , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[7]  Stephen J. Garland,et al.  Algorithm 97: Shortest path , 1962, Commun. ACM.

[8]  Erik B. van der Tol,et al.  Mapping of H.264 decoding on a multiprocessor architecture , 2003, IS&T/SPIE Electronic Imaging.

[9]  Basilio B. Fraguela,et al.  A Generic Algorithm Template for Divide-and-Conquer in Multicore Systems , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[10]  Jonathan Schaeffer,et al.  Generating parallel programs from the wavefront design pattern , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[11]  Dave Strenski,et al.  Exploring Accelerating Science Applications with FPGAs , 2007 .

[12]  Li Yi,et al.  Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions , 2009, HPDC '09.

[13]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[14]  Lawrence Snyder,et al.  Pipelining Wavefront Computations: Experiences and Performance , 2000, IPDPS Workshops.

[15]  Jean-Thierry Lapresté,et al.  Quaff: efficient C++ design for parallel skeletons , 2006, Parallel Comput..