Evaluation of the Task Programming Model in the Parallelization of Wavefront Problems

This paper analyzes the applicability of the task programming model in the parallelization of generic wave front problems. Computations on this type of problems are characterized by a data dependency pattern across a data space, which can produce a variable number of independent tasks through the traversal of such space. Precisely, we think that it is better to formulate the parallelization of this wave front-based programs in terms of logical tasks, instead of threads for several reasons: more efficient matching of computations to available resources, faster start-up and creation task times, improved load balancing and higher level thinking. To implement the parallel wave front we have used two state-of-the art task libraries: TBB and OpenMP 3.0. In this work, we highlight the differences between both implementations, from a programmer standpoint and from the performance point of view. For it, we conduct several experiments to identify the factors that can limit the performance on each case. Besides, we present in the paper a wave front template based on tasks, template that makes easier the coding of parallel wave front codes. We have validated this template with three real dynamic programming algorithms, finding that the TBB-coded template always outperforms the OpenMP based-one.

[1]  Bruno R. Preiss,et al.  Data Structures and Algorithms with Object-Oriented Design Patterns in Java , 1999 .

[2]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[3]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[4]  Lawrence Snyder,et al.  Pipelining Wavefront Computations: Experiences and Performance , 2000, IPDPS Workshops.

[5]  P. Gács,et al.  Algorithms , 1992 .

[6]  Gilles Brassard,et al.  Fundamentals of Algorithmics , 1995 .

[7]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[8]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[9]  Mendel Rosenblum,et al.  Streamware: programming general-purpose multicore processors using streams , 2008, ASPLOS.

[10]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[11]  Alejandro Duran,et al.  Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.

[12]  Jonathan Schaeffer,et al.  Generating parallel programs from the wavefront design pattern , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.