Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs

The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of an irregular and unpredictable search space, as arises e.g. in combinatorial optimization problems. As such UTS presents a highly unbalanced task graph that challenges scheduling, load balancing, termination detection, and task coarsening strategies. Scalability and overheads are compared for OpenMP 3.0, Cilk, and an OpenMP implementation of the benchmark without tasks that performs all scheduling, load balancing, and termination detection explicitly. Current OpenMP 3.0 implementations generally exhibit poor behavior on the UTS benchmark.

[1]  Xinmin Tian,et al.  Compiler support of the workqueuing execution model for Intel SMP architectures , 2002 .

[2]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[3]  Richard M. Stallman,et al.  Using the GNU Compiler Collection , 2010 .

[4]  Stephen L. Olivier,et al.  UTS: An Unbalanced Tree Search Benchmark , 2006, LCPC.

[5]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[6]  Toni Cortes,et al.  First Workshop on Execution Environments for Distributed Computing , 2007 .

[7]  Marsha Chechik,et al.  Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds , 2008 .

[8]  Bronis R. de Supinski,et al.  OpenMP in a New Era of Parallelism, 4th International Workshop, IWOMP 2008, West Lafayette, IN, USA, May 12-14, 2008, Proceedings , 2008, IWOMP.

[9]  David Baker Proteins by design , 1992, Nature.

[10]  Donald E. Eastlake,et al.  US Secure Hash Algorithm 1 (SHA1) , 2001, RFC.

[11]  Alejandro Duran,et al.  Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.

[12]  Alejandro Duran,et al.  An Experimental Evaluation of the New OpenMP Tasking Model , 2007, LCPC.

[13]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[14]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[15]  Alejandro Duran,et al.  An adaptive cut-off for task parallelism , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Eduard Ayguadé,et al.  OpenMP tasks in IBM XL compilers , 2008, CASCON '08.