Language support for dynamic, hierarchical data partitioning

Applications written for distributed-memory parallel architectures must partition their data to enable parallel execution. As memory hierarchies become deeper, it is increasingly necessary that the data partitioning also be hierarchical to match. Current language proposals perform this hierarchical partitioning statically, which excludes many important applications where the appropriate partitioning is itself data dependent and so must be computed dynamically. We describe Legion, a region-based programming system, where each region may be partitioned into subregions. Partitions are computed dynamically and are fully programmable. The division of data need not be disjoint and subregions of a region may overlap, or alias one another. Computations use regions with certain privileges (e.g., expressing that a computation uses a region read-only) and data coherence (e.g., expressing that the computation need only be atomic with respect to other operations on the region), which can be controlled on a per-region (or subregion) basis. We present the novel aspects of the Legion design, in particular the combination of static and dynamic checks used to enforce soundness. We give an extended example illustrating how Legion can express computations with dynamically determined relationships between computations and data partitions. We prove the soundness of Legion's type system, and show Legion type checking improves performance by up to 71% by eliding provably safe memory checks. In particular, we show that the dynamic checks to detect aliasing at runtime at the region granularity have negligible overhead. We report results for three real-world applications running on distributed memory machines, achieving up to 62.5X speedup on 96 GPUs on the Keeneland supercomputer.

[1]  Simon L. Peyton Jones,et al.  Composable memory transactions , 2005, CACM.

[2]  Sophia Drossopoulou,et al.  Ownership, encapsulation and the disjointness of type and effect , 2002, OOPSLA '02.

[3]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[4]  Adam Welc,et al.  Safe nondeterminism in a deterministic-by-default parallel language , 2011, POPL '11.

[5]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[7]  Vivek Sarkar,et al.  Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement , 2009, LCPC.

[8]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[9]  John Clark,et al.  Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia , 2011, PPoPP '11.

[10]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[11]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[12]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[13]  James Cheney,et al.  Region-based memory management in cyclone , 2002, PLDI '02.

[14]  Mads Tofte,et al.  Region-based Memory Management , 1997, Inf. Comput..

[15]  Michael Bauer,et al.  Language Support for Dynamic, Hierarchical Data Partitioning (Extended Version) , 2013 .

[16]  Karsten Schwan,et al.  Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community , 2011, Computing in Science & Engineering.

[17]  John C. Reynolds,et al.  Separation logic: a logic for shared mutable data structures , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[18]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[19]  Liuba Shrira,et al.  Ownership types for object encapsulation , 2003, POPL '03.

[20]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA '09.

[21]  David Gay,et al.  Language support for regions , 2001, PLDI '01.