BDDT:: block-level dynamic dependence analysisfor deterministic task-based parallelism

Reasoning about synchronization, ordering and conflicting memory accesses makes parallel programming difficult, error-prone and hard to test, debug and maintain. Task-parallel programming models such as OpenMP, Cilk and Sequoia offer a more structured way of expressing parallelism than threads, but still require the programmer to manually find and enforce any ordering or memory dependencies among tasks. Programming models with implicit parallelism such as SvS, OoOJava, or StarSs lift this limitation by automatically inferring parallelism and dependencies, requiring the programmer to describe the memory footprint of each task. Current limitations of these systems require the programmer to restrict task footprints into either whole and isolated program objects, one-dimensional array ranges, or static compile-time regions; all producing over-approximations and false dependencies that reduce the available parallelism in the program. This paper presents BDDT, a task-parallel runtime system that dynamically discovers and resolves dependencies among parallel tasks. BDDT allows the programmer to specify detailed task footprints on any memory address range or multidimensional array tile. BDDT uses a block-based dependence analysis with arbitrary granularity, making it easier to apply to

[1]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[2]  Andrew Brownsword,et al.  Synchronization via scheduling: techniques for efficiently managing shared state , 2011, PLDI '11.

[3]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA '09.

[4]  John Clark,et al.  Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia , 2011, PPoPP '11.

[5]  Eduard Ayguadé,et al.  Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..

[6]  Dan Grossman,et al.  RCDC: a relaxed consistency deterministic computer , 2011, ASPLOS XVI.

[7]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[8]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[9]  Jesús Labarta,et al.  Handling task dependencies under strided and aliased references , 2010, ICS '10.

[10]  Jesús Labarta,et al.  CellSs: Making it easier to program the Cell Broadband Engine processor , 2007, IBM J. Res. Dev..

[11]  Cédric Augonnet,et al.  StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines , 2010 .

[12]  Polyvios Pratikakis,et al.  Parallel Programming of General-Purpose Programs Using Task-Based Programming Models , 2011, HotPar.

[13]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[14]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[15]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[16]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[17]  Brian Demsky,et al.  OoOJava: software out-of-order execution , 2011, PPoPP '11.

[18]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[19]  Emery D. Berger,et al.  Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA '09.