Legion: Expressing locality and independence with logical regions

Modern parallel architectures have both heterogeneous processors and deep, complex memory hierarchies. We present Legion, a programming model and runtime system for achieving high performance on these machines. Legion is organized around logical regions, which express both locality and independence of program data, and tasks, functions that perform computations on regions. We describe a runtime system that dynamically extracts parallelism from Legion programs, using a distributed, parallel scheduling algorithm that identifies both independent tasks and nested parallelism. Legion also enables explicit, programmer controlled movement of data through the memory hierarchy and placement of tasks based on locality information via a novel mapping interface. We evaluate our Legion implementation on three applications: fluid-flow on a regular grid, a three-level AMR code solving a heat diffusion equation, and a circuit simulation.

[1]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[2]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[3]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[4]  David Gay,et al.  Language support for regions , 2001, PLDI '01.

[5]  James Cheney,et al.  Formal Type Soundness for Cyclone''s Region System , 2001 .

[6]  Kathryn S. McKinley,et al.  Reconsidering custom memory allocation , 2002, OOPSLA '02.

[7]  Bradford L. Chamberlain,et al.  The cascade high productivity language , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[8]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[9]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[10]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[11]  Jason Duell,et al.  Productivity and performance using partitioned global address space languages , 2007, PASCO '07.

[12]  Vivek Sarkar,et al.  Type inference for locality analysis of distributed data structures , 2008, PPoPP.

[13]  Keshav Pingali,et al.  Optimistic parallelism benefits from data partitioning , 2008, ASPLOS.

[14]  Vikram S. Adve,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA '09.

[15]  Swarat Chaudhuri,et al.  Parallel programming with object assemblies , 2009, OOPSLA '09.

[16]  Jesús Labarta,et al.  Handling task dependencies under strided and aliased references , 2010, ICS '10.

[17]  John Clark,et al.  Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia , 2011, PPoPP '11.

[18]  Bradford L. Chamberlain,et al.  Using the High Productivity Language Chapel to Target GPGPU Architectures , 2011 .

[19]  Vivek Sarkar,et al.  Subregion Analysis and Bounds Check Elimination for High Level Arrays , 2011, CC.

[20]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[21]  Dimitrios S. Nikolopoulos,et al.  A Unified Scheduler for Recursive and Task Dataflow Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[22]  B. Chamberlain,et al.  Authoring User-Defined Domain Maps in Chapel ∗ , 2011 .

[23]  Karsten Schwan,et al.  Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community , 2011, Computing in Science & Engineering.

[24]  Brian Demsky,et al.  DOJ: dynamically parallelizing object-oriented programs , 2012, PPoPP '12.

[25]  Polyvios Pratikakis,et al.  BDDT:: block-level dynamic dependence analysisfor deterministic task-based parallelism , 2012, PPoPP '12.