Evaluation of Runtime Cut-off Approaches for Parallel Programs

Parallel programs have the potential of executing several times faster than sequential programs. However, in order to achieve its potential, several aspects of the execution have to be parameterized, such as the number of threads, task granularity, etc. This work studies the task granularity of regular and irregular parallel programs on symmetrical multicore machines. Task granularity is how many parallel tasks are created to perform a certain computation. If the granularity is too coarse, there might not be enough parallelism to occupy all processors. But if granularity is too fine, a large percentage of the execution time may be spent context switching between tasks, and not performing useful work.

[1]  Stephen L. Olivier,et al.  Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs , 2010, International Journal of Parallel Programming.

[2]  Constantine D. Polychronopoulos,et al.  Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs , 1993, LCPC.

[3]  Paulo Marques,et al.  Concurrency by default: using permissions to express dataflow in stateful programs , 2009, OOPSLA Companion.

[4]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[5]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[6]  Alejandro Duran,et al.  Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.

[7]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[8]  Alejandro Duran,et al.  Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.

[9]  Guy E. Blelloch,et al.  Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[10]  Arthur Charguéraud,et al.  Oracle scheduling: controlling granularity in implicitly parallel languages , 2011, OOPSLA '11.

[11]  L.A. Smith,et al.  A Parallel Java Grande Benchmark Suite , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[12]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[13]  Stephen L. Olivier,et al.  Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs , 2009, IWOMP.

[14]  Alejandro Duran,et al.  An adaptive cut-off for task parallelism , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[16]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[17]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .