Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs

This work presents a hierarchical, parallel, dynamic dependence analysis for inferring run-time dependencies between recursively parallel tasks in the OmpSs programming model. To evaluate the dependence analysis we implement PARTEE, a scalable runtime system that supports implicit synchronization between nested parallel tasks. We evaluate the performance of the resulting runtime system and compare it to Nanos++, the state of the art OmpSs implementation, and Cilk, a high performance task-parallel runtime system without implicit task synchronization. We find that i) PARTEE is able to handle more fine grained tasks than Nanos++, ii) PARTEE's performance is comparable to that of Cilk, iii) in cases where task dependencies are irregular, PARTEE outperforms Cilk by up to 103%.

[1]  William J. Dally,et al.  Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.

[2]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[3]  Polyvios Pratikakis,et al.  Parallel Programming of General-Purpose Programs Using Task-Based Programming Models , 2011, HotPar.

[4]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[5]  Polyvios Pratikakis,et al.  BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism , 2013, APPT.

[6]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[7]  Foivos S. Zakkak,et al.  Inference and Declaration of Independence in Task-Parallel Programs , 2013, APPT.

[8]  Angelos Bilas,et al.  Tagged Procedure Calls (TPC): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor , 2010, HiPEAC.

[9]  David Chase,et al.  Dynamic circular work-stealing deque , 2005, SPAA '05.

[10]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[11]  David Gay,et al.  Memory management with explicit regions , 1998, PLDI.

[12]  Sebastian Burckhardt,et al.  The design of a task parallel library , 2009, OOPSLA.

[13]  Brian Demsky,et al.  OoOJava: software out-of-order execution , 2011, PPoPP '11.

[14]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA 2009.

[15]  Jesús Labarta,et al.  Handling task dependencies under strided and aliased references , 2010, ICS '10.

[16]  Alejandro Duran,et al.  Compiler Automatic Discovery of OmpSs Task Dependencies , 2012, LCPC.

[17]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).