Structure Slicing: Extending Logical Regions with Fields

Applications on modern supercomputers are increasingly limited by the cost of data movement, but mainstream programming systems have few abstractions for describing the structure of a program's data. Consequently, the burden of managing data movement, placement, and layout currently falls primarily upon the programmer. To address this problem we previously proposed a data model based on logical regions and described Legion, a programming system incorporating logical regions. In this paper, we present structure slicing, which incorporates fields into the logical region data model. We show that structure slicing enables Legion to automatically infer task parallelism from field non-interference, decouple the specification of data usage from layout, and reduce the overall amount of data moved. We demonstrate that structure slicing enables both strong and weak scaling of three Legion applications including S3D, a production combustion simulation that uses logical regions with thousands of fields, with speedups of up to 3.68X over a vectorized CPU-only Fortran implementation and 1.88X over an independently hand-tuned OpenACC code.

[1]  Irving L. Traiger,et al.  Granularity of Locks and Degrees of Consistency in a Shared Data Base , 1998, IFIP Working Conference on Modelling in Data Base Management Systems.

[2]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[3]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[4]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multi-threaded programs , 1997, TOCS.

[5]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[6]  James R. Larus,et al.  Cache-conscious structure definition , 1999, PLDI '99.

[7]  Steven J. Deitz,et al.  Abstractions for dynamic data distribution , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[8]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[9]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[10]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[11]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[12]  Adam Welc,et al.  Safe nondeterminism in a deterministic-by-default parallel language , 2011, POPL '11.

[13]  Kevin Skadron,et al.  Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Alexander Aiken,et al.  Data representation synthesis , 2011, PLDI '11.

[15]  Karsten Schwan,et al.  Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community , 2011, Computing in Science & Engineering.

[16]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Ray W. Grout,et al.  Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Robert Strzodka Data layout optimization for multi-valued containers in OpenCL , 2012, J. Parallel Distributed Comput..

[19]  M. Pharr,et al.  ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).

[20]  Alexander Aiken,et al.  Language support for dynamic, hierarchical data partitioning , 2013, OOPSLA.

[21]  Michael Garland,et al.  A decomposition for in-place matrix transposition , 2014, PPoPP '14.

[22]  Alexander Aiken,et al.  Realm: An event-based low-level runtime for distributed memory architectures , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).