An Experimental Evaluation of the New OpenMP Tasking Model

The OpenMP standard was conceived to parallelize dense array-based applications, and it has achieved much success with that. Recently, a novel tasking proposal to handle unstructured parallelism in OpenMP has been submitted to the OpenMP 3.0 Language Committee. We tested its expressiveness and flexibility, using it to parallelize a number of examples from a variety of different application areas. Furthermore, we checked whether the model can be implemented efficiently, evaluating the performance of an experimental implementation of the tasking proposal on an SGI Altix 4700, and comparing it to the performance achieved with Intel's Workqueueing model and other worksharing alternatives currently available in OpenMP 2.5. We conclude that the new OpenMP tasks allow the expression of parallelism for a broad range of applications and that they will not hamper application performance.

[1]  Grant E. Haab,et al.  Flexible control structures for parallelism in OpenMP , 2000, Concurr. Pract. Exp..

[2]  Tor Sørevik,et al.  Load balancing and OpenMP implementation of nested parallelism , 2005, Parallel Comput..

[3]  Barbara Chapman A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP 2007, Beijing, China, June 3-7, 2007, Proceedings , 2008, IWOMP.

[4]  Jeffrey C. Carver,et al.  Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  B. Chamberlain An application kernel matrix for studying the productivity of parallel programming languages , 2004, ICSE 2004.

[6]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[7]  Massimo Bernaschi,et al.  OpenMP parallelization of agent-based models , 2005, Parallel Comput..

[8]  Paul Petersen,et al.  Flexible control structures for parallelism in OpenMP , 2000, Concurr. Pract. Exp..

[9]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[10]  Tze Meng Low,et al.  Scalable parallelization of FLAME code via the workqueuing model , 2008, TOMS.

[11]  Jack J. Dongarra,et al.  Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead , 2006, PARA.

[12]  Alejandro Duran,et al.  A Proposal for Task Parallelism in OpenMP , 2007, IWOMP.

[13]  Eduard Ayguadé,et al.  Nanos mercurium: A research compiler for OpenMP , 2004 .